I have to admit that this email made me feel a little bit dumb. Could you provide a TL;DR summary that at least provides a little context for this? Is this something that people writing types, functions, hiera backends, or report processors need to concern themselves with?
On Mon, Oct 13, 2014 at 3:57 PM, Henrik Lindberg < [email protected]> wrote: > Hi, > As you may know a memory leak was found in 3.7 (PUP-3345) and it seems > like we found the cause of the problem. (YAY !!!) > > In order to find the leak, I came up with some (rough) tools to help > detect leakage. Below are some tips if you want to use them. But first, the > cause. > > Basically, the cause was a "faulty" cache implementation that made several > assumptions that were not correct. So, here are some tips what not to do. > > Do not hold on to things in class variables (e.g. @@my_cache) unless the > cache only contains things that are in the loaded ruby code and share the > same lifecycle as the class. Alternatively you must have something that > evict the cache content on some sort of transaction boundary. In the case > found, this did not happen, and for each environment, it added a reference > to a resource type instance (and since they get reloaded for each > environment, the cache kept on growing). > > I would go so far as to say, almost never use the Class level for regular > programming - create instances instead. That forces you to think about the > lifecycle - when is it created, when does the things it hold on to get > freed, etc. > > When using an object as a hash key, that object typically must have a hash > method, and an equals method or you will very likely end up with an ever > growing set of entries in the hash. > > If you are tempted to use the support for WeakRef in Ruby - then give up > immediately since it is horribly slow on Ruby 1.8, and does not work > correctly on Ruby 1.9 (seems to be based on Object Ids that can get > recycled). If they worked owever, a WeakRef is otherwise ideal for cache > implementations since it only binds the object if something else is also > referencing it. (Still plenty of opportunity to write a cache > that is incorrect though). > > Before you implement a cache - measure if the cache is an actual speed > improvement! The overhead of a cache may eat the performance gain - or it > may even be worse! > > Avoid binding lots of objects in the cache. Bind an identifier / name if > possible. You may think you are keeping track of a Banana, but attached to > that you may have a Gorilla, and it needs its jungle... > > The "Tools" > =========== > A new "benchmark" was added to the code base called "catalog_memory" - it > is the same benchmark as "empty_catalog" (it contains a single "hello > world" notice in each catalog), but the "catalog_memory" is instrumented to > dump information about memory usage. > > To run this, you must be using Ruby 2.1.0. Then (if running from source) > do: > > bundle exec rake benchmark:catalog_memory > > This will print some stats about the first and last run (it does 10 runs). > It then computes the set of objects in memory that were not bound at the > start, and it outputs two data files; "heap.json" with information about > all live objects in memory, and "diff.json" with information about the diff > between start and end of the run. > > It also outputs a list of source locations and methods being called where > the allocations of the "leaked" objects were made. This list is typically > not very helpful unless the leak is trivial. > > Once at this point, there is a rake task called "memwalk" that reads the > two fils "heap.json", and "diff.json" and produces a graphviz .dot file > that can be rendered. The result is a graph of all objects in memory and > how they bind each other. (There is more to say about this...) > > You run this task with: > > bundle exec rake memwalk > > Then you produce the graph with the command: > > dot -Tsvg -omemwalk.svg memwalk.dot > > You now have a "memwalk.svg" file that you can open in Chrome. Nice > features are that you can search the graph (like searching on any web > page), and you can zoom and pan. > > The graph has a bubble per object, and it shows its address in hex. Arrows > point to referenced objects from objects that bind them. > > The graph is pruned from all arrays, hashes and leaf data objects. For > arrays and hashes it skips over them, and instead shows the Object that > ultimately holds on to the structure (without the interleaving nested > structure). This makes the graph readable (and have a size that is possible > to process and view). > > The memwalk command prints out some information about what it rendered > (counts). If you see something like tens of thousands of objects then the > leak is massive and you may not be able to process it (nor be able to read > and navigate the huge graph). > > To find a leak, browse the resulting graph, and find clusters that are not > supposed to be there. In the current case, there where 10 > Puppet::Node::Environment objects and there was only supposed to be one. > > Then copy the address of one of the objects that are not supposed to be > there in order to do a walk of only it and the objects that keeps it alive. > Say 7f9afa20ba38. > > Then run memwalk again, now for this object (you need to quote the > argument now): > > bundle exec rake 'memwalk[7f9afa20ba38]' > > This creates a file called memwalk-7f9afa20ba38.dot that you can now > render using the dot command. > > View that and look at how it is bound. You may find that it is indirectly > bound, and you may need to repeat this with what now appears to be a root > holding on to a cluster of objects. > > When you got this far you know the class(es) involved. You may also want > to figure out where it was allocated, and you can do that by using grep in > the heap.json - say: > > grep 7f9afa20ba38 heap.json > > which will print out the information about this allocation (among other > things it shows the file and line where it was allocated, a list of objects > it references, and address (in hex) to the class object. > > This allows you to manually grep / walk the heap to find more details. > (Or continue hacking on the memwalk rake script to make it do what you > want. > > Hope the above is of help to someone having to track down a memory leak in > the future... > > Regards > - henrik > > > -- > > Visit my Blog "Puppet on the Edge" > http://puppet-on-the-edge.blogspot.se/ > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/puppet-dev/m1hld0%24mfc%241%40ger.gmane.org. > For more options, visit https://groups.google.com/d/optout. > -- Ben Ford | Training Solutions Engineer Puppet Labs, Inc. 926 NW 13th Ave, Suite #210 Portland, OR 97209 509.592.7291 [email protected] -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CACkW_L5fCj%3DHfeYTTwsbeXef6wgkjg%2B1fC2c2nSdhJxMGcMLKQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
