On Mar 17, 2010, at 12:27 PM, Brice Figureau wrote:
On 17/03/10 20:09, Brice Figureau wrote:
On 16/03/10 19:55, Brice Figureau wrote:
On 15/03/10 21:44, Luke Kanies wrote:
On Mar 15, 2010, at 2:06 AM, Brice Figureau wrote:
On Sun, 2010-03-14 at 15:49 -0700, Luke Kanies wrote:
On Mar 14, 2010, at 1:42 AM, Brice Figureau wrote:
[snipped]
Frankly I'd prefer we fix the issue in 0.25.x with this patch
(granted
someone else can test it with success).
I don't really have lot of time until June for puppet dev
(except those
occasional fixes), but I can at least try to start something. If
you
want to do it, then go ahead, that will certainly be much more
effective
than my procrastination :-)
Ok. I'll get this patch tested, and if I have some airplane time
or
something see what a full checksum refactor would look like. The
'source' refactor ended up being relatively easy in the end and
it had a
huge impact on maintainability.
Peter tested my patch today and found that it was working as
intended
(there is no more md5 checksumming of all the files).
But he also discovered something that we didnt't even thought
about: the
first run of:
file {
"/deep/hierarchy/full/of/files":
recurse => true, checksum => none, mode => 0600
}
produces many changes in the transaction (assuming the files are not
0600 of course). This in turn generates a large consumption of
memory
and 100% of cpu use for a looong time (ie more than 30 min for 7.5k
files said Peter, several minutes for 2.5k on my fast computer).
Why?
After a quick search, we found that most of the time was taken
around
Puppet::SimpleGraph#matching_edges or around line 218 of
Puppet::Transaction (ie I'm not sure it's the call itselft or
iteration
on the result).
I didn't do a deep analysis of the problem, but to me we are
generating
tons of events that we are propagating up to the root file (that
spawned
all those sub-file-resources). I don't think in this case that it is
necessary, unless there is a notify/subscribe (which is not the
case).
One of the problem I think (no evidence to back this up, it's only
my
understanding) is that we dup the current edge in a fast paced
loop. The
poor ruby GC has really no time to collect the previous edge
instance so
we pile up new edge instances at a really fast pace, spending most
of
it's time in the allocator (the ruby one or the libgc one, which is
costly operation).
I'll do more tests later this week on this code (I first have to
understand it of course :-)).
Anyway commenting line 218 in transaction.rb, and the aforementioned
manifest becomes really fast (if not instantaneous).
Any comments?
OK, I identified the issue (patch coming soon):
Puppet::SimpleGraph#matching_edge is simply completely inefficient in
front of lots of events for a given resource (which is our case).
The problem is that matching_edge returns the list of edges for
which we
should trigger an event. In the aforementioned manifest, since
there is
no subscribe/notify, the list is empty.
Unfortunately with the current code, it takes about 22s to find this
result for about 1100 events. With the (soon to be published
patch), I
was able to reduce this time to 4s.
Note: 22s doesn't seem that many, but I also tested with about 5000
events and got about 2min (new code), and stopped it after more than
15min (old code).
Still... 2minutes just for the graph operations? That's pretty gross.
I suppose this wouldn't be mitigated at all if we switched to not
creating individual file resources per file, since you'd still need
the same number of events.
It seems the previous code was at least quadratic (damn ruby
allocator,
not the code itself).
Peter, if you have some spare time, I suggest you try this new patch
on
top of the previous one (although they're not related).
Warning: I didn't really test the new code, and it isn't covered with
tests (it shouldn't modify any behavior).
I think that code is pretty well covered, so it should be a straight
refactor.
--
'Tis better to be silent and be thought a fool, than to speak and
remove all doubt. --Abraham Lincoln
---------------------------------------------------------------------
Luke Kanies -|- http://reductivelabs.com -|- +1(615)594-8199
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-dev?hl=en.