Hi,
As you may know a memory leak was found in 3.7 (PUP-3345) and it seems like we found the cause of the problem. (YAY !!!)

In order to find the leak, I came up with some (rough) tools to help detect leakage. Below are some tips if you want to use them. But first, the cause.

Basically, the cause was a "faulty" cache implementation that made several assumptions that were not correct. So, here are some tips what not to do.

Do not hold on to things in class variables (e.g. @@my_cache) unless the cache only contains things that are in the loaded ruby code and share the same lifecycle as the class. Alternatively you must have something that evict the cache content on some sort of transaction boundary. In the case found, this did not happen, and for each environment, it added a reference to a resource type instance (and since they get reloaded for each environment, the cache kept on growing).

I would go so far as to say, almost never use the Class level for regular programming - create instances instead. That forces you to think about the lifecycle - when is it created, when does the things it hold on to get freed, etc.

When using an object as a hash key, that object typically must have a hash method, and an equals method or you will very likely end up with an ever growing set of entries in the hash.

If you are tempted to use the support for WeakRef in Ruby - then give up immediately since it is horribly slow on Ruby 1.8, and does not work correctly on Ruby 1.9 (seems to be based on Object Ids that can get recycled). If they worked owever, a WeakRef is otherwise ideal for cache implementations since it only binds the object if something else is also referencing it. (Still plenty of opportunity to write a cache
that is incorrect though).

Before you implement a cache - measure if the cache is an actual speed improvement! The overhead of a cache may eat the performance gain - or it may even be worse!

Avoid binding lots of objects in the cache. Bind an identifier / name if possible. You may think you are keeping track of a Banana, but attached to that you may have a Gorilla, and it needs its jungle...

The "Tools"
===========
A new "benchmark" was added to the code base called "catalog_memory" - it is the same benchmark as "empty_catalog" (it contains a single "hello world" notice in each catalog), but the "catalog_memory" is instrumented to dump information about memory usage.

To run this, you must be using Ruby 2.1.0. Then (if running from source) do:

bundle exec rake benchmark:catalog_memory

This will print some stats about the first and last run (it does 10 runs). It then computes the set of objects in memory that were not bound at the start, and it outputs two data files; "heap.json" with information about all live objects in memory, and "diff.json" with information about the diff between start and end of the run.

It also outputs a list of source locations and methods being called where the allocations of the "leaked" objects were made. This list is typically not very helpful unless the leak is trivial.

Once at this point, there is a rake task called "memwalk" that reads the two fils "heap.json", and "diff.json" and produces a graphviz .dot file that can be rendered. The result is a graph of all objects in memory and how they bind each other. (There is more to say about this...)

You run this task with:

bundle exec rake memwalk

Then you produce the graph with the command:

dot -Tsvg -omemwalk.svg memwalk.dot

You now have a "memwalk.svg" file that you can open in Chrome. Nice features are that you can search the graph (like searching on any web page), and you can zoom and pan.

The graph has a bubble per object, and it shows its address in hex. Arrows point to referenced objects from objects that bind them.

The graph is pruned from all arrays, hashes and leaf data objects. For arrays and hashes it skips over them, and instead shows the Object that ultimately holds on to the structure (without the interleaving nested structure). This makes the graph readable (and have a size that is possible to process and view).

The memwalk command prints out some information about what it rendered (counts). If you see something like tens of thousands of objects then the leak is massive and you may not be able to process it (nor be able to read and navigate the huge graph).

To find a leak, browse the resulting graph, and find clusters that are not supposed to be there. In the current case, there where 10 Puppet::Node::Environment objects and there was only supposed to be one.

Then copy the address of one of the objects that are not supposed to be there in order to do a walk of only it and the objects that keeps it alive. Say 7f9afa20ba38.

Then run memwalk again, now for this object (you need to quote the argument now):

bundle exec rake 'memwalk[7f9afa20ba38]'

This creates a file called memwalk-7f9afa20ba38.dot that you can now render using the dot command.

View that and look at how it is bound. You may find that it is indirectly bound, and you may need to repeat this with what now appears to be a root holding on to a cluster of objects.

When you got this far you know the class(es) involved. You may also want to figure out where it was allocated, and you can do that by using grep in the heap.json - say:

grep 7f9afa20ba38 heap.json

which will print out the information about this allocation (among other things it shows the file and line where it was allocated, a list of objects it references, and address (in hex) to the class object.

This allows you to manually grep / walk the heap to find more details.
(Or continue hacking on the memwalk rake script to make it do what you want.

Hope the above is of help to someone having to track down a memory leak in the future...

Regards
- henrik


--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/m1hld0%24mfc%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Reply via email to