Hi,
As you may know a memory leak was found in 3.7 (PUP-3345) and it seems
like we found the cause of the problem. (YAY !!!)
In order to find the leak, I came up with some (rough) tools to help
detect leakage. Below are some tips if you want to use them. But first,
the cause.
Basically, the cause was a "faulty" cache implementation that made
several assumptions that were not correct. So, here are some tips what
not to do.
Do not hold on to things in class variables (e.g. @@my_cache) unless the
cache only contains things that are in the loaded ruby code and share
the same lifecycle as the class. Alternatively you must have something
that evict the cache content on some sort of transaction boundary. In
the case found, this did not happen, and for each environment, it added
a reference to a resource type instance (and since they get reloaded for
each environment, the cache kept on growing).
I would go so far as to say, almost never use the Class level for
regular programming - create instances instead. That forces you to think
about the lifecycle - when is it created, when does the things it hold
on to get freed, etc.
When using an object as a hash key, that object typically must have a
hash method, and an equals method or you will very likely end up with an
ever growing set of entries in the hash.
If you are tempted to use the support for WeakRef in Ruby - then give up
immediately since it is horribly slow on Ruby 1.8, and does not work
correctly on Ruby 1.9 (seems to be based on Object Ids that can get
recycled). If they worked owever, a WeakRef is otherwise ideal for cache
implementations since it only binds the object if something else is also
referencing it. (Still plenty of opportunity to write a cache
that is incorrect though).
Before you implement a cache - measure if the cache is an actual speed
improvement! The overhead of a cache may eat the performance gain - or
it may even be worse!
Avoid binding lots of objects in the cache. Bind an identifier / name if
possible. You may think you are keeping track of a Banana, but attached
to that you may have a Gorilla, and it needs its jungle...
The "Tools"
===========
A new "benchmark" was added to the code base called "catalog_memory" -
it is the same benchmark as "empty_catalog" (it contains a single "hello
world" notice in each catalog), but the "catalog_memory" is instrumented
to dump information about memory usage.
To run this, you must be using Ruby 2.1.0. Then (if running from source) do:
bundle exec rake benchmark:catalog_memory
This will print some stats about the first and last run (it does 10
runs). It then computes the set of objects in memory that were not bound
at the start, and it outputs two data files; "heap.json" with
information about all live objects in memory, and "diff.json" with
information about the diff between start and end of the run.
It also outputs a list of source locations and methods being called
where the allocations of the "leaked" objects were made. This list is
typically not very helpful unless the leak is trivial.
Once at this point, there is a rake task called "memwalk" that reads the
two fils "heap.json", and "diff.json" and produces a graphviz .dot file
that can be rendered. The result is a graph of all objects in memory and
how they bind each other. (There is more to say about this...)
You run this task with:
bundle exec rake memwalk
Then you produce the graph with the command:
dot -Tsvg -omemwalk.svg memwalk.dot
You now have a "memwalk.svg" file that you can open in Chrome. Nice
features are that you can search the graph (like searching on any web
page), and you can zoom and pan.
The graph has a bubble per object, and it shows its address in hex.
Arrows point to referenced objects from objects that bind them.
The graph is pruned from all arrays, hashes and leaf data objects. For
arrays and hashes it skips over them, and instead shows the Object that
ultimately holds on to the structure (without the interleaving nested
structure). This makes the graph readable (and have a size that is
possible to process and view).
The memwalk command prints out some information about what it rendered
(counts). If you see something like tens of thousands of objects then
the leak is massive and you may not be able to process it (nor be able
to read and navigate the huge graph).
To find a leak, browse the resulting graph, and find clusters that are
not supposed to be there. In the current case, there where 10
Puppet::Node::Environment objects and there was only supposed to be one.
Then copy the address of one of the objects that are not supposed to be
there in order to do a walk of only it and the objects that keeps it
alive. Say 7f9afa20ba38.
Then run memwalk again, now for this object (you need to quote the
argument now):
bundle exec rake 'memwalk[7f9afa20ba38]'
This creates a file called memwalk-7f9afa20ba38.dot that you can now
render using the dot command.
View that and look at how it is bound. You may find that it is
indirectly bound, and you may need to repeat this with what now appears
to be a root holding on to a cluster of objects.
When you got this far you know the class(es) involved. You may also want
to figure out where it was allocated, and you can do that by using grep
in the heap.json - say:
grep 7f9afa20ba38 heap.json
which will print out the information about this allocation (among other
things it shows the file and line where it was allocated, a list of
objects it references, and address (in hex) to the class object.
This allows you to manually grep / walk the heap to find more details.
(Or continue hacking on the memwalk rake script to make it do what you want.
Hope the above is of help to someone having to track down a memory leak
in the future...
Regards
- henrik
--
Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/m1hld0%24mfc%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.