Re: [Puppet Users] Re: Puppet 3.6.0... and scaling?

Andy Parker Tue, 27 May 2014 14:39:30 -0700

On Tuesday, May 27, 2014 5:53:29 AM UTC-7, Konrad Scherer wrote:
>
> On 14-05-22 03:21 PM, Daniele Sluijters wrote: 
> > The environment caching is already there, use the environment_timeout 
> > setting. Mine is set to unlimited and I reload at deploy time by 
> > touching tmp/restart.txt. This so far seems to work really well. 
>
> Thanks for the suggestion. I have also been dealing with high CPU load 
> on my puppet masters since 3.5.0. Triggering the puppet master restart 
> makes a lot of sense. I am using a git post commit hook to reload the 
> puppet configs on my three puppet masters and I have added the code to 
> restart the puppet rack app after changes have been detected. I will 
> report back once I have had some time to analyze the results. 
>
>
By "puppet configs" do you mean the puppet manifest files? Under rack the 
puppet master doesn't watch nor reload the puppet.conf file.


> This seems like a major change from previous puppet versions. I have 
> been using Puppet since 2.6 and any changes to puppet configs on the 
> master were always picked up immediately. Is this because the puppet 
> master was not doing any caching or is the puppet master watching the 
> puppet configs for changes? Has this behavior now changed? Will changes 
> to puppet manifests on the master only be detected after the 
> environment_timeout has expired? 
>
>
The caching behavior for directory environments is a bit different from the 
previous system. I've been working on a blog post about this, but haven't 
finished it yet :(

First off, what is being cached? When we talk about caching environments we 
are talking (mostly) about caching the parsed and validated form of the 
manifest files. This saves the cost of disk access (stat to find files, 
reads to list directory contents, reads to fetch manifest file contents) as 
well as a certain amount of CPU use (lexing, parsing, building an AST, 
validating the AST). This is what has been part of the cache for quite a 
while now.

What has changed is the cache eviction mechanism that is used. The 
directory environments employ a different eviction and caching system that 
the "legacy" environments. The legacy environments had singleton instances 
that the master would never get rid of to track each individual 
environment. The environments have references to the AST objects as well as 
to WatchedFile objects, which are used to track changes to the mtime of the 
manifest files. The WatchedFile instances would stat the file that they are 
supposed to watch, but limit the stat calls to happen no more often than 
the filetimeout setting specified. Before Puppet 3.4 (? 3.5? I lose track 
of what version had what change) the WatchedFile instances would get 
interrogated throughout the compilation process. In fact, every time it 
asked if one file had changed it ended up asking if *any* files had 
changed. There were a lot of side effects of that, but I won't derail the 
conversation to go in to that. In 3.4 (or was it 3.5) the legacy 
environment system was changed to only check if files had changed at the 
beginning of a compile. This, however, meant that it would still in the 
worst case issue a stat call for every manifest file, in the best case 
(depending on your viewpoint) issue no stat calls because the filetimeout 
had not expired, or it would be some in-between number of stats. The 
in-between number of stats is possible because each WatchedFile instance 
had its own timer for the filetimeout and so they can drift apart over 
time, which allowed it to detect changes to some files but not others.

For the directory environments we chose a different system for managing the 
caches. The watch word here was KISS. Under the new system there isn't any 
file watching involved (right now, that is. There is a PR open to introduce 
a 'manual' environment_timeout system), instead once an environment has 
loaded a file it simply holds onto the result. All of the caching now comes 
down to holding onto just the environment instance. Cache eviction is just 
about when puppet should throw away that environment instance and re-create 
it. There are a few options here:

  * environment_timeout = 0 : Good to use in an development setup where you 
are editing manifest and running an agent to see what happens. Nothing will 
be cached and so the full lex, parse, validate overhead is incurred on 
every agent catalog request.
  * environment_timeout = <some number> : If you have "spikey" agent 
requests. For instance, if you don't run agents continually and instead 
only trigger them as needed with mco. In that case you know that from the 
first agent checking in to the last agent checking in it is 20 minutes and 
you do this kind of on demand deploy once a day, then just set the timeout 
to 30m (20 minutes + some extra time to deal with variance). This way the 
cache will last through the whole run, but will have expired by the next 
time you run.
  * environment_timeout = unlimited : When agents are checking in all of 
the time, then there is no "down time" when the cache can reasonably go 
away. Anything less that unlimited here will cause periodic spikes in CPU 
usage as each passenger worker reparses everything.  Even with unlimited 
you'll get reparsing simply because the passenger workers will be 
periodically killed, but that is out of my control. When unlimited is in 
use, then the question becomes, "when *should* the manifests be reloaded?". 
Well, whenever you deploy some new manifests. Any more often than that is 
just a waste (in fact, that is really the answer for all of these cases). 
This is where the graceful restarts are used. Whenever a manifest change is 
done, trigger a graceful restart, which will cause the environment caches 
to be lost (because the process dies) and recreated for the next request.

The reason we went with this system for cache eviction is because it 
actually puts a lot more control in your hands for resource utilization. 
When puppet was watching files it would end up doing stat calls (which can 
be very slow) even when nothing was changing. Since manifests only change 
when they are being actively changed by something (either a person editing 
them, or a deploy process laying down new versions), this moves the 
decision about when to incur a manifest reload cost back to the user.

There has also been a question once or twice about why we didn't just go 
with inotify or a similar system. Mostly it comes down to complexity and 
portability. This system works anywhere that puppet can run and is 
immensely simpler.

Thank you in advance for any insight. 
>
>
I hope what I said above is of help.
 

> -- 
> Konrad Scherer, MTS, Linux Products Group, Wind River 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/2f9a6d0c-4eea-4279-836c-2c02ee6e778b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Re: Puppet 3.6.0... and scaling?

Reply via email to