On Dec 17, 2010, at 7:44 AM, Trevor Vaughan wrote:

> I've been looking at the usage of MD5 checksums by Puppet and I think
> that there may be room for quite a bit of optimization.
> 
> The clients seem to compute the MD5 checksum of all files and in
> catalog content every time they compare two files. What if:
> 
> 1) The size of any known content is used as a first level comparison.
> Obviously, if the sizes differ, the files differ. I don't see this in
> 0.24.X, but I haven't checked 2.6.X.
> 
> 2) The *server* pre-computes checksums for all content items in File
> resources and passes those in the catalog, then only one MD5 sum needs
> to be calculated.
> 
> 3) When using the puppet server in a 'source' element, the server
> passes the checksum of the file on the server. If they differ, then
> the file is passed across to the client.
> 
> 4) For ultimate speed, a direct comparison should be an option as a
> checksum type. Directly comparing the content of the in-memory file
> and the target file appears to be twice as fast as an MD5 checksum.
> This would not be feasible for a 'source'.
> 
> These techniques will place more burden on the server, but may cut the
> CPU resources needed on the client by as much as half from some
> preliminary testing.
> 
>  user     system      total        real
> MD5:   0.810000   0.230000   1.040000 (  1.050886)
> MD52:  0.400000   0.120000   0.520000 (  0.525936)
> Hash:   0.550000   0.270000   0.820000 (  0.821033)
> Comp:  0.290000   0.120000   0.410000 (  0.407351)
> 
> MD5 -> MD5 comparison of two 100M files
> MD52 -> MD5 comparison where one file has been pre-computed
> Hash -> Using String.hash to do the comparison
> Comp -> Direct comparison of the files
> 
> If anyone can provide a quick and dirty hack to get these into Puppet,
> I'll be happy to test them.

This seems like a good idea to me.  Interesting that the direct comparison is 
so much faster.  How would you log that, if they're different?

Could you open a ticket on this?

I can't promise that we'll spend dev time on it right now, but it'd be great to 
capture it to start.

-- 
Always read stuff that will make you look good if you die in the
middle of it.       -- P. J. O'Rourke
---------------------------------------------------------------------
Luke Kanies  -|-   http://puppetlabs.com   -|-   +1(615)594-8199




-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to