[Puppet-dev] Re: The future of known_resource_types and loading puppet manifests

Henrik Lindberg Wed, 28 Aug 2013 07:10:21 -0700

On 2013-26-08 19:49, Andy Parker wrote:

Adrien put a lot of effort into tracking down what was happening in
#15106 (Missing site.pp can cause error 'Cannot find definition Class').
That exact issue, as described in that bug, has been fixed, but in the
investigation Adrien figured out that there are a lot of other problems
that can crop up (https://projects.puppetlabs.com/issues/15106#note-13).


Basically it comes down to the way puppet tracks what is loaded, what
can be loaded, and when things need to be reloaded. When compiling a
catalog from manifests, the autoloader (for puppet types, not for ruby
code) will be invoked at various times to parse the .pp files that it
thinks should contain the types that are needed. At the same time it
caches what it has already parsed in a Puppet::Resource::TypeCollection,
which throughout the code is known as known_resource_types. There are
also a few cases where the TypeCollection will be cleared, even part way
through a compile, that causes it to start reloading things.

Charlie Sharpsteen, Adrien, and I talked about this around a week ago,
before puppetconf and came to the conclusion that the current method of
autoloading puppet manifests and tracking known types is just untenable.
There are multiple points in the code where it loses track of the
environment that it is working with, trying to pass that information
through (I tried it a few days ago) ends up uncovering more issues.

The conclusion that we came to was that the current lazy-loading of
puppet manifests needs to go away. Lazy loading makes all of the
information to correctly load types at the right time and from the right
place very difficult to keep track of (not intrinsically so, but in our
current state).

I think the system needs to change to eager loading of manifests (not
applying them all, but at least loading them all). For the development
case, this makes things maybe a little more expensive, but it should
make the stable, production case for manifests much faster, because it
will rarely, if ever need to look at the filesystem to find a type.

Now the problem is that if we start going down this path, it becomes a
large change to the underlying architecture of the compiler.. It will be
unnoticeable  to most users from a manifest standpoint (unless somehow
they were able to rely on certain manifests never being loaded), however
we may need to make changes that will break code at the ruby level
(maybe the testing wrappers, maybe types and providers, probably some
functions).

I think something this large should be an ARM, but I wanted to put this
out here to get some feedback before working up an ARM. Maybe we are
missing something and we can salvage this without a larger change, but
at the moment I'm skeptical.

I have read this, and the comments made to date, and it is somewhatdifficult to understand exactly what someone means as we use languagethat is fuzzy (at least to me).


Here is an attempt to define the terms (after that I have a proposal).

Parsing
-------
The part of the process that goes "from source text to AST model".

Validation
----------
Checks/asserts the validity of the AST model.

Loading
-------

Resolves symbolic name to something that can be evaluated. (i.e. ASTmodel or Ruby, or whatever we may invent in the future). As an examplethis binds the name of a hostclass to the block of code that is theclass' body.


Linking
-------

Resolving name to object references. This is not done in puppet as aseparate static step, it is done while evaluating.


Evaluation
----------

Evaluates the loaded logic (i.e. visits AST nodes and performsoperations or calls Ruby).


Compilation
-----------

The act of loading a given start point and evaluating it (and itstransitive dependencies) for the purpose of compiling a catalog.


Deferred Evaluation
-------------------

We have deferred evaluation of language constructs that define classes(and custom resource types? I have to check) - or rather, when evaluatedthey only define the mapping of symbolic name to code to evaluate ondemand (either a singleton evaluation (class), or a potentially multipletimes (resource).

(In puppet a hostclass is not evaluated, instead there is a search forinstantiable objects, these are transitively instantiated on "loading".Later it's "code" (body) is evaluated).

(In contrast the term "lazy loading" throws me; what is it that is lazy?The parsing, the binding of name to code, or the evaluation of the boundcode?).


Proposal
========

To me, the problem we are discussing is that "autoloading" performsevaluation of an unlinked model. The result therefore depends on thetransitive dependency graph of resolved links. We cache the result andthen try to figure out what needs to be invalidated based on a changed file.

At the other extreme, if we cache nothing, manifests are processed fromscratch for every request, we have a potential long startup.

A simple solution is to cache the validated parse result. This is asimple mapping from source URI (e.g. a file path) to an AST. This isalways a 1:1 mapping - the source and the AST are two differentrepresentations of exactly the same thing. Then when we evaluate, wealways evaluate everything. There is one special case, when none of thefiles have changed there is an opportunity to avoid recomputing thecatalog, but it assumes that no external data has changed. (There areseveral different ways to deal with such optimizations including askingsomething external "have something changed" to using "valid until"information in the external touch points).

Yet another problem is a change in files "mid-transaction". We couldsolve that by performing a scan of the system, noting all potential URIsaffecting the result and their "expiration-timestamp" (no parsing takesplace). If we during the evaluation finds a change in timestamp we failthe transaction (or restart it (backing off in time and having a cap onretries if we want to be fancy)).

I use the term "URI affecting the result" to mean a reference to a .ppsource file, data-bindings in some form, or an external service (provingsay ENC data/bindings)), or similar.

I think the above is a combination of "autoloading" and "load everythingup front".

I would like to get rid of "import" because it is path based, notbecause it "imports" (loads code). I.e. I think we should have a loaderthat resolves symbolic names to URIs and loads evaluatable content.This loader should be able to search for what to "run" without having toresort to explicit "run this path" - if not then there is IMO somethingmissing in the language itself. I can live with the entry point being afile (e.g. site.pp), or possibly a set of files if users for some reasonwant to split a site.pp into multiple files.


- henrik


--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

[Puppet-dev] Re: The future of known_resource_types and loading puppet manifests

Reply via email to