On Fri, Jan 25, 2013 at 1:02 AM, Erik Dalén <[email protected]> wrote:
>
>
> On Friday 25 January 2013 at 00:24, Matthaus Owens wrote:
>
>> Improved catalog storage performance
>>
>> Some improvements have been made to the way catalog hashes are
>> computed for deduplication, resulting in somewhat faster catalog
>> storage, and a significant decrease in the amount of time taken to
>> store the first catalog received after startup.
>
>
> Could you give a bit more explanation what this means?
> Is it only a change in how they are computed or are they also stored in a 
> different way?
>

As background, we have a function resource-identity-hash* which
computes the unique hash of a resource, and is called for every
resource in the catalog. This is the core of our deduplication.
Because this function involves stringifying and then hashing the
entire content of the resource, it's somewhat expensive. So we take
advantage of the fact that most resources are duplicates, and cache
the results of that function in memory.

This change is to the way we do that caching. Previously we used a
least-recently-used memoization function from the clojure/core.memoize
library. That function does some extra work we don't really need that
causes cache misses to be extremely expensive. That means the first
catalog or two, which contain purely resources we haven't seen, could
take around 15 seconds to be stored, because they were entirely cache
misses. This problem got worse when you had multiple workers computing
the same hashes (while performing a lot of redundant work). With
several workers, I saw this taking up to a minute or two.

Now we use our own simplified bounded-memoize function, which reduces
this to about 1/4 of a second. The result is slightly faster catalog
storage when we get a catalog with some new resources, and much faster
storage for the first few catalogs. The algorithm used to compute the
hash and the way the hash is stored are the same. The change is only
mentioned in the release notes because it's a performance improvement.

> --
> Erik Dalén
>
>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/puppet-dev?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to