On Fri, Jan 25, 2013 at 1:02 AM, Erik Dalén <[email protected]> wrote: > > > On Friday 25 January 2013 at 00:24, Matthaus Owens wrote: > >> Improved catalog storage performance >> >> Some improvements have been made to the way catalog hashes are >> computed for deduplication, resulting in somewhat faster catalog >> storage, and a significant decrease in the amount of time taken to >> store the first catalog received after startup. > > > Could you give a bit more explanation what this means? > Is it only a change in how they are computed or are they also stored in a > different way? >
As background, we have a function resource-identity-hash* which computes the unique hash of a resource, and is called for every resource in the catalog. This is the core of our deduplication. Because this function involves stringifying and then hashing the entire content of the resource, it's somewhat expensive. So we take advantage of the fact that most resources are duplicates, and cache the results of that function in memory. This change is to the way we do that caching. Previously we used a least-recently-used memoization function from the clojure/core.memoize library. That function does some extra work we don't really need that causes cache misses to be extremely expensive. That means the first catalog or two, which contain purely resources we haven't seen, could take around 15 seconds to be stored, because they were entirely cache misses. This problem got worse when you had multiple workers computing the same hashes (while performing a lot of redundant work). With several workers, I saw this taking up to a minute or two. Now we use our own simplified bounded-memoize function, which reduces this to about 1/4 of a second. The result is slightly faster catalog storage when we get a catalog with some new resources, and much faster storage for the first few catalogs. The algorithm used to compute the hash and the way the hash is stored are the same. The change is only mentioned in the release notes because it's a performance improvement. > -- > Erik Dalén > > > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/puppet-dev?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/puppet-dev?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
