On 2013-26-08 19:49, Andy Parker wrote:
Adrien put a lot of effort into tracking down what was happening in
#15106 (Missing site.pp can cause error 'Cannot find definition Class').
That exact issue, as described in that bug, has been fixed, but in the
investigation Adrien figured out that there are a lot of other problems
that can crop up (https://projects.puppetlabs.com/issues/15106#note-13).

Basically it comes down to the way puppet tracks what is loaded, what
can be loaded, and when things need to be reloaded. When compiling a
catalog from manifests, the autoloader (for puppet types, not for ruby
code) will be invoked at various times to parse the .pp files that it
thinks should contain the types that are needed. At the same time it
caches what it has already parsed in a Puppet::Resource::TypeCollection,
which throughout the code is known as known_resource_types. There are
also a few cases where the TypeCollection will be cleared, even part way
through a compile, that causes it to start reloading things.

Charlie Sharpsteen, Adrien, and I talked about this around a week ago,
before puppetconf and came to the conclusion that the current method of
autoloading puppet manifests and tracking known types is just untenable.
There are multiple points in the code where it loses track of the
environment that it is working with, trying to pass that information
through (I tried it a few days ago) ends up uncovering more issues.

The conclusion that we came to was that the current lazy-loading of
puppet manifests needs to go away. Lazy loading makes all of the
information to correctly load types at the right time and from the right
place very difficult to keep track of (not intrinsically so, but in our
current state).

I think the system needs to change to eager loading of manifests (not
applying them all, but at least loading them all). For the development
case, this makes things maybe a little more expensive, but it should
make the stable, production case for manifests much faster, because it
will rarely, if ever need to look at the filesystem to find a type.

Now the problem is that if we start going down this path, it becomes a
large change to the underlying architecture of the compiler.. It will be
unnoticeable  to most users from a manifest standpoint (unless somehow
they were able to rely on certain manifests never being loaded), however
we may need to make changes that will break code at the ruby level
(maybe the testing wrappers, maybe types and providers, probably some
functions).

I think something this large should be an ARM, but I wanted to put this
out here to get some feedback before working up an ARM. Maybe we are
missing something and we can salvage this without a larger change, but
at the moment I'm skeptical.


I have read this, and the comments made to date, and it is somewhat difficult to understand exactly what someone means as we use language that is fuzzy (at least to me).

Here is an attempt to define the terms (after that I have a proposal).

Parsing
-------
The part of the process that goes "from source text to AST model".

Validation
----------
Checks/asserts the validity of the AST model.

Loading
-------
Resolves symbolic name to something that can be evaluated. (i.e. AST model or Ruby, or whatever we may invent in the future). As an example this binds the name of a hostclass to the block of code that is the class' body.

Linking
-------
Resolving name to object references. This is not done in puppet as a separate static step, it is done while evaluating.

Evaluation
----------
Evaluates the loaded logic (i.e. visits AST nodes and performs operations or calls Ruby).

Compilation
-----------
The act of loading a given start point and evaluating it (and its transitive dependencies) for the purpose of compiling a catalog.

Deferred Evaluation
-------------------
We have deferred evaluation of language constructs that define classes (and custom resource types? I have to check) - or rather, when evaluated they only define the mapping of symbolic name to code to evaluate on demand (either a singleton evaluation (class), or a potentially multiple times (resource).

(In puppet a hostclass is not evaluated, instead there is a search for instantiable objects, these are transitively instantiated on "loading". Later it's "code" (body) is evaluated).

(In contrast the term "lazy loading" throws me; what is it that is lazy? The parsing, the binding of name to code, or the evaluation of the bound code?).

Proposal
========
To me, the problem we are discussing is that "autoloading" performs evaluation of an unlinked model. The result therefore depends on the transitive dependency graph of resolved links. We cache the result and then try to figure out what needs to be invalidated based on a changed file.

At the other extreme, if we cache nothing, manifests are processed from scratch for every request, we have a potential long startup.

A simple solution is to cache the validated parse result. This is a simple mapping from source URI (e.g. a file path) to an AST. This is always a 1:1 mapping - the source and the AST are two different representations of exactly the same thing. Then when we evaluate, we always evaluate everything. There is one special case, when none of the files have changed there is an opportunity to avoid recomputing the catalog, but it assumes that no external data has changed. (There are several different ways to deal with such optimizations including asking something external "have something changed" to using "valid until" information in the external touch points).

Yet another problem is a change in files "mid-transaction". We could solve that by performing a scan of the system, noting all potential URIs affecting the result and their "expiration-timestamp" (no parsing takes place). If we during the evaluation finds a change in timestamp we fail the transaction (or restart it (backing off in time and having a cap on retries if we want to be fancy)).

I use the term "URI affecting the result" to mean a reference to a .pp source file, data-bindings in some form, or an external service (proving say ENC data/bindings)), or similar.

I think the above is a combination of "autoloading" and "load everything up front".

I would like to get rid of "import" because it is path based, not because it "imports" (loads code). I.e. I think we should have a loader that resolves symbolic names to URIs and loads evaluatable content. This loader should be able to search for what to "run" without having to resort to explicit "run this path" - if not then there is IMO something missing in the language itself. I can live with the entry point being a file (e.g. site.pp), or possibly a set of files if users for some reason want to split a site.pp into multiple files.

- henrik


--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to