Re: [Puppet Users] Re: Puppet 3.6.0... and scaling?

Konrad Scherer Fri, 30 May 2014 11:50:28 -0700

On 05/27/2014 05:39 PM, Andy Parker wrote:

On Tuesday, May 27, 2014 5:53:29 AM UTC-7, Konrad Scherer wrote:


    On 14-05-22 03:21 PM, Daniele Sluijters wrote:
     > The environment caching is already there, use the environment_timeout
     > setting. Mine is set to unlimited and I reload at deploy time by
     > touching tmp/restart.txt. This so far seems to work really well.

    Thanks for the suggestion. I have also been dealing with high CPU load
    on my puppet masters since 3.5.0. Triggering the puppet master restart
    makes a lot of sense. I am using a git post commit hook to reload the
    puppet configs on my three puppet masters and I have added the code to
    restart the puppet rack app after changes have been detected. I will
    report back once I have had some time to analyze the results.

By "puppet configs" do you mean the puppet manifest files? Under rack the puppet
master doesn't watch nor reload the puppet.conf file.


That wasn't clear, sorry. I mean puppet manifest *.pp files, not the conf files.


    This seems like a major change from previous puppet versions. I have
    been using Puppet since 2.6 and any changes to puppet configs on the
    master were always picked up immediately. Is this because the puppet
    master was not doing any caching or is the puppet master watching the
    puppet configs for changes? Has this behavior now changed? Will changes
    to puppet manifests on the master only be detected after the
    environment_timeout has expired?


The caching behavior for directory environments is a bit different from the
previous system. I've been working on a blog post about this, but haven't
finished it yet :(

First off, what is being cached? When we talk about caching environments we are
talking (mostly) about caching the parsed and validated form of the manifest
files. This saves the cost of disk access (stat to find files, reads to list
directory contents, reads to fetch manifest file contents) as well as a certain
amount of CPU use (lexing, parsing, building an AST, validating the AST). This
is what has been part of the cache for quite a while now.

What has changed is the cache eviction mechanism that is used. The directory
environments employ a different eviction and caching system that the "legacy"
environments. The legacy environments had singleton instances that the master
would never get rid of to track each individual environment. The environments
have references to the AST objects as well as to WatchedFile objects, which are
used to track changes to the mtime of the manifest files. The WatchedFile
instances would stat the file that they are supposed to watch, but limit the
stat calls to happen no more often than the filetimeout setting specified.
Before Puppet 3.4 (? 3.5? I lose track of what version had what change) the
WatchedFile instances would get interrogated throughout the compilation process.
In fact, every time it asked if one file had changed it ended up asking if *any*
files had changed. There were a lot of side effects of that, but I won't derail
the conversation to go in to that. In 3.4 (or was it 3.5) the legacy environment
system was changed to only check if files had changed at the beginning of a
compile. This, however, meant that it would still in the worst case issue a stat
call for every manifest file, in the best case (depending on your viewpoint)
issue no stat calls because the filetimeout had not expired, or it would be some
in-between number of stats. The in-between number of stats is possible because
each WatchedFile instance had its own timer for the filetimeout and so they can
drift apart over time, which allowed it to detect changes to some files but not
others.

For the directory environments we chose a different system for managing the
caches. The watch word here was KISS. Under the new system there isn't any file
watching involved (right now, that is. There is a PR open to introduce a
'manual' environment_timeout system), instead once an environment has loaded a
file it simply holds onto the result. All of the caching now comes down to
holding onto just the environment instance. Cache eviction is just about when
puppet should throw away that environment instance and re-create it. There are a
few options here:

   * environment_timeout = 0 : Good to use in an development setup where you are
editing manifest and running an agent to see what happens. Nothing will be
cached and so the full lex, parse, validate overhead is incurred on every agent
catalog request.
   * environment_timeout = <some number> : If you have "spikey" agent requests.
For instance, if you don't run agents continually and instead only trigger them
as needed with mco. In that case you know that from the first agent checking in
to the last agent checking in it is 20 minutes and you do this kind of on demand
deploy once a day, then just set the timeout to 30m (20 minutes + some extra
time to deal with variance). This way the cache will last through the whole run,
but will have expired by the next time you run.
   * environment_timeout = unlimited : When agents are checking in all of the
time, then there is no "down time" when the cache can reasonably go away.
Anything less that unlimited here will cause periodic spikes in CPU usage as
each passenger worker reparses everything.  Even with unlimited you'll get
reparsing simply because the passenger workers will be periodically killed, but
that is out of my control. When unlimited is in use, then the question becomes,
"when *should* the manifests be reloaded?". Well, whenever you deploy some new
manifests. Any more often than that is just a waste (in fact, that is really the
answer for all of these cases). This is where the graceful restarts are used.
Whenever a manifest change is done, trigger a graceful restart, which will cause
the environment caches to be lost (because the process dies) and recreated for
the next request.

The reason we went with this system for cache eviction is because it actually
puts a lot more control in your hands for resource utilization. When puppet was
watching files it would end up doing stat calls (which can be very slow) even
when nothing was changing. Since manifests only change when they are being
actively changed by something (either a person editing them, or a deploy process
laying down new versions), this moves the decision about when to incur a
manifest reload cost back to the user.

There has also been a question once or twice about why we didn't just go with
inotify or a similar system. Mostly it comes down to complexity and portability.
This system works anywhere that puppet can run and is immensely simpler.

    Thank you in advance for any insight.


I hope what I said above is of help.

Yes thank you for taking the time to explain this. It helps me understand thebehavior I was seeing and the change in behavior starting with 3.5.0. I agreethat moving the manifest reload cost decision to the user is a good idea.Especially in my environment where I can go days without making changes to themanifests. I am testing now with environment_timeout=unlimited and a 'touch/etc/puppet/rack/tmp/reload.txt' in the script that gets notified of changes tomy git repo of puppet manifests. I will report back when I have some data.


--
Konrad Scherer, MTS, Linux Products Group, Wind River

--
You received this message because you are subscribed to the Google Groups "Puppet 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/5388D2C3.6000403%40windriver.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Re: Puppet 3.6.0... and scaling?

Reply via email to