On 2013-26-08 19:49, Andy Parker wrote:
Adrien put a lot of effort into tracking down what was happening in
#15106 (Missing site.pp can cause error 'Cannot find definition Class').
That exact issue, as described in that bug, has been fixed, but in the
investigation Adrien figured out that there are a lot of other problems
that can crop up (https://projects.puppetlabs.com/issues/15106#note-13).
Basically it comes down to the way puppet tracks what is loaded, what
can be loaded, and when things need to be reloaded. When compiling a
catalog from manifests, the autoloader (for puppet types, not for ruby
code) will be invoked at various times to parse the .pp files that it
thinks should contain the types that are needed. At the same time it
caches what it has already parsed in a Puppet::Resource::TypeCollection,
which throughout the code is known as known_resource_types. There are
also a few cases where the TypeCollection will be cleared, even part way
through a compile, that causes it to start reloading things.
Charlie Sharpsteen, Adrien, and I talked about this around a week ago,
before puppetconf and came to the conclusion that the current method of
autoloading puppet manifests and tracking known types is just untenable.
There are multiple points in the code where it loses track of the
environment that it is working with, trying to pass that information
through (I tried it a few days ago) ends up uncovering more issues.
The conclusion that we came to was that the current lazy-loading of
puppet manifests needs to go away. Lazy loading makes all of the
information to correctly load types at the right time and from the right
place very difficult to keep track of (not intrinsically so, but in our
current state).
I think the system needs to change to eager loading of manifests (not
applying them all, but at least loading them all). For the development
case, this makes things maybe a little more expensive, but it should
make the stable, production case for manifests much faster, because it
will rarely, if ever need to look at the filesystem to find a type.
Now the problem is that if we start going down this path, it becomes a
large change to the underlying architecture of the compiler.. It will be
unnoticeable to most users from a manifest standpoint (unless somehow
they were able to rely on certain manifests never being loaded), however
we may need to make changes that will break code at the ruby level
(maybe the testing wrappers, maybe types and providers, probably some
functions).
I think something this large should be an ARM, but I wanted to put this
out here to get some feedback before working up an ARM. Maybe we are
missing something and we can salvage this without a larger change, but
at the moment I'm skeptical.
--
Andrew Parker
This is very interesting discussion.
In Geppetto the approach is to parse everything up-front. This is not
the same as what is typically referred to as parsing in the puppet
community which seems to also involve evaluation and linking. We often
talk about "parse order" when we actually mean "evaluation order".
Parsing is quite straight forward, simply turn the DSL source into an
AST and remember it. It is the linking (and evaluation) that is tricky,
especially when changes can take place to files mid-transaction.
In Geppetto it is not good to keep all information about all files in
memory all the time and there a technique is used to populate an index
with references to the source positions where referable elements are
located, this index is used during linking. A graph of all dependencies
is created in a way that makes it possible to compute if a change to a
"name => element" change will have effect on other resolutions (it was
missing, now it exists, it existed was resolved, but should now resolve
differently).
The build system that kicks in on any change makes use of the dependency
information to link the resulting AST in the correct order.
Sometimes it is required for the builder to make an extra pass due to
lack of information (the dependencies were not yet computed), or when
there is circularity.
Puppet is a very tricky language to link since many of the links can be
dynamic (not known until evaluation takes place). You can see this in
Geppetto; sometimes you have to do a "build clean" as the interactive
state does not know that issues where resolved. Also, in many cases it
is not possible to validate/link these constructs at all.
In Geppetto, the majority of the computing time is to validate the links.
With all of that said. In a Puppet Master we could parse all files to
AST but when we start evaluation we start over from the beginning every
time (check if file is stale, if so reparse it). Holding more state than
that in memory is very problematic (esp. when environments are involved).
I was talking with Eric Dalén, and he said they tried running the master
in way where each catalog request got a fresh process. If I understood
him correctly this did not have much (if any) negative impact on
performance !! I can think of many reasons why, ruby is not the best at
garbage collection, when things are done in virgin state there are less
things to check (poor cache implementations are sometimes worse than no
cache), memory is better organized. I can also imagine that speedups
that were measured in the past may not be as relevant today when disks
are much faster (SSD even), and with speed improvements in the Ruby
runtime. It is high time to again measure where the bottlenecks are.
So - Andy, I am very much looking forward to hearing about the
measurements you are doing on parse time. Are your running both old/new
parser and 1.8.7, 1.9.3 ?
Regards
- henrik
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.