-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This was just a comparison script that I wrote.

Checking through the Ruby source, it definitely looks like the digest
methods just pull the entire "string" into memory.

If this is a file, then we're already taking the memory hit that you
would take by just comparing the two files.

This makes complete sense since Digest doesn't know what you're passing.

I will note that it looks like chunking a file and performing the
checksum might take twice as long.

I'm thinking that size+time (similar to rsync) might be enough for most
files on a system. There will be relatively few files on a system that
you'll want to do a full checksum on.

Thanks,

Trevor

On 12/26/2010 12:40 AM, Luke Kanies wrote:
> On Dec 23, 2010, at 4:58, Trevor Vaughan <[email protected]> wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Brice,
>>
>> Thanks for the feedback, this is good stuff!
>>
>>>
>>> That's more or less what rsync does. For sourced files we could even use
>>> HTTP If-Modified-Since and/or If-None-Match to perform the check (and
>>> thus the check would be done server side).
>>
>> Yes, I briefly looked at the Rsync algorithm papers to see if I could
>> figure out how to re-implement it in Ruby but just using the native
>> Rsync libraries might be a better call. However, that would introduce an
>> external dependency.
>>
>>>
>>>> 4) For ultimate speed, a direct comparison should be an option as a
>>>> checksum type. Directly comparing the content of the in-memory file
>>>> and the target file appears to be twice as fast as an MD5 checksum.
>>>> This would not be feasible for a 'source'.
>>>
>>> That might be faster, but please don't re-introduce the slurp the whole
>>> file in memory syndrom.
>>
>> It seems that MD5 might be doing it anyway. When I tried a block-wise
>> 'comp', it was *much* slower and I think it was even slower than MD5 (or
>> close anyway) which means that MD5 is reading the whole blob into memory
>> to work on it anyway! If we're going to take the memory hit, let's just
>> take it and compare the two items.
> 
> Is this an md5 script you wrote, or are you using the Puppet code?
> We've worked to add 'stream' checksum types that checksum the file a
> bit at a time.
> 
> I expect that most of those are actually a good bit slower than just
> reading the whole thing in and checksumming, but they're faster by
> being less ram-efficient.
> 
>>> That's really something I'd like to work on. Unfortunately this is
>>> really complex stuff. The file type is one of the biggest type and even
>>> though I already worked on it, I'm not sure I grasped enough to be able
>>> to fully refactor it for a different inner working.
>>
>> Completely agreed. I'll do what I can to help, but my outside time is
>> severely limited.
> 
> 

- -- 
Trevor Vaughan
 Vice President, Onyx Point, Inc.
 email: [email protected]
 phone: 410-541-ONYX (6699)
 pgp: 0x6C701E94

- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJNGx5LAAoJECNCGV1OLcypUrQH/RjbY56VfBunWk5rV1cgMCSO
VMzXjqY0HhyAvOtYpcesYDpvPHNsnSBx3684TCX1+VYfy8vh9lFy6CxEqB3ohwN5
gHjIBs2c6ZpT8UloywkwbMwAkFnqFXMfQ2/ELOfGvKsHwWq+Z9uVxW/vPxmswPJ0
U6qiDnmk762OfRyD0/sBNsYljnUXwDBidWC9up9WO+hEz9bSr+NLSxMc+5PsVjyl
kRtGtnBNqnE8Sw8VEjGKjrHkuoCR9pqAiGU2KM4h827zkog5oy0ghPolnEJXMD82
ErEXo+Y6C7xZc7U62+0eS96Zb0LZi9B412c5PpB08TEP18lJwCwSWWtY47dSJgQ=
=FPbW
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

<<attachment: tvaughan.vcf>>

Reply via email to