On 2013-26-08 19:49, Andy Parker wrote:
Adrien put a lot of effort into tracking down what was happening in
#15106 (Missing site.pp can cause error 'Cannot find definition Class').
That exact issue, as described in that bug, has been fixed, but in the
investigation Adrien figured out that there are a lot of other problems
that can crop up (https://projects.puppetlabs.com/issues/15106#note-13).
Basically it comes down to the way puppet tracks what is loaded, what
can be loaded, and when things need to be reloaded. When compiling a
catalog from manifests, the autoloader (for puppet types, not for ruby
code) will be invoked at various times to parse the .pp files that it
thinks should contain the types that are needed. At the same time it
caches what it has already parsed in a Puppet::Resource::TypeCollection,
which throughout the code is known as known_resource_types. There are
also a few cases where the TypeCollection will be cleared, even part way
through a compile, that causes it to start reloading things.
Charlie Sharpsteen, Adrien, and I talked about this around a week ago,
before puppetconf and came to the conclusion that the current method of
autoloading puppet manifests and tracking known types is just untenable.
There are multiple points in the code where it loses track of the
environment that it is working with, trying to pass that information
through (I tried it a few days ago) ends up uncovering more issues.
The conclusion that we came to was that the current lazy-loading of
puppet manifests needs to go away. Lazy loading makes all of the
information to correctly load types at the right time and from the right
place very difficult to keep track of (not intrinsically so, but in our
current state).
I think the system needs to change to eager loading of manifests (not
applying them all, but at least loading them all). For the development
case, this makes things maybe a little more expensive, but it should
make the stable, production case for manifests much faster, because it
will rarely, if ever need to look at the filesystem to find a type.
Now the problem is that if we start going down this path, it becomes a
large change to the underlying architecture of the compiler.. It will be
unnoticeable to most users from a manifest standpoint (unless somehow
they were able to rely on certain manifests never being loaded), however
we may need to make changes that will break code at the ruby level
(maybe the testing wrappers, maybe types and providers, probably some
functions).
I think something this large should be an ARM, but I wanted to put this
out here to get some feedback before working up an ARM. Maybe we are
missing something and we can salvage this without a larger change, but
at the moment I'm skeptical.
I have read this, and the comments made to date, and it is somewhat
difficult to understand exactly what someone means as we use language
that is fuzzy (at least to me).
Here is an attempt to define the terms (after that I have a proposal).
Parsing
-------
The part of the process that goes "from source text to AST model".
Validation
----------
Checks/asserts the validity of the AST model.
Loading
-------
Resolves symbolic name to something that can be evaluated. (i.e. AST
model or Ruby, or whatever we may invent in the future). As an example
this binds the name of a hostclass to the block of code that is the
class' body.
Linking
-------
Resolving name to object references. This is not done in puppet as a
separate static step, it is done while evaluating.
Evaluation
----------
Evaluates the loaded logic (i.e. visits AST nodes and performs
operations or calls Ruby).
Compilation
-----------
The act of loading a given start point and evaluating it (and its
transitive dependencies) for the purpose of compiling a catalog.
Deferred Evaluation
-------------------
We have deferred evaluation of language constructs that define classes
(and custom resource types? I have to check) - or rather, when evaluated
they only define the mapping of symbolic name to code to evaluate on
demand (either a singleton evaluation (class), or a potentially multiple
times (resource).
(In puppet a hostclass is not evaluated, instead there is a search for
instantiable objects, these are transitively instantiated on "loading".
Later it's "code" (body) is evaluated).
(In contrast the term "lazy loading" throws me; what is it that is lazy?
The parsing, the binding of name to code, or the evaluation of the bound
code?).
Proposal
========
To me, the problem we are discussing is that "autoloading" performs
evaluation of an unlinked model. The result therefore depends on the
transitive dependency graph of resolved links. We cache the result and
then try to figure out what needs to be invalidated based on a changed file.
At the other extreme, if we cache nothing, manifests are processed from
scratch for every request, we have a potential long startup.
A simple solution is to cache the validated parse result. This is a
simple mapping from source URI (e.g. a file path) to an AST. This is
always a 1:1 mapping - the source and the AST are two different
representations of exactly the same thing. Then when we evaluate, we
always evaluate everything. There is one special case, when none of the
files have changed there is an opportunity to avoid recomputing the
catalog, but it assumes that no external data has changed. (There are
several different ways to deal with such optimizations including asking
something external "have something changed" to using "valid until"
information in the external touch points).
Yet another problem is a change in files "mid-transaction". We could
solve that by performing a scan of the system, noting all potential URIs
affecting the result and their "expiration-timestamp" (no parsing takes
place). If we during the evaluation finds a change in timestamp we fail
the transaction (or restart it (backing off in time and having a cap on
retries if we want to be fancy)).
I use the term "URI affecting the result" to mean a reference to a .pp
source file, data-bindings in some form, or an external service (proving
say ENC data/bindings)), or similar.
I think the above is a combination of "autoloading" and "load everything
up front".
I would like to get rid of "import" because it is path based, not
because it "imports" (loads code). I.e. I think we should have a loader
that resolves symbolic names to URIs and loads evaluatable content.
This loader should be able to search for what to "run" without having to
resort to explicit "run this path" - if not then there is IMO something
missing in the language itself. I can live with the entry point being a
file (e.g. site.pp), or possibly a set of files if users for some reason
want to split a site.pp into multiple files.
- henrik
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.