On Fri, Feb 08, 2008 at 09:41:09AM +0100, Erik Cederstrand wrote:
> Brooks Davis skrev:
>> On Wed, Jan 30, 2008 at 07:20:23PM +0100, Erik Cederstrand wrote:
>>> 
>>> I'd like a situation where I can very quickly set up a slave with a 
>>> specific version of FreeBSD to run additional tests or provide shell 
>>> access to a developer. This currently involves adding an entry to a 
>>> queue, rebooting and waiting 2 minutes. Quick and easy, but the archiving 
>>> strategy is obviously very inefficient.
>>> 
>>> I'm thinking of a couple of options:
>>> 1. Having one full install per month and archiving the rest as diffs
>>>    against that by recursively bsdiff'ing every file in the tree (I
>>>    could bsdiff a whole tarball, but bsdiff is very memory-intensive).
>>>    Quick test: 25 mins.
>>> 2. Make a hash of all files and only store the binaries where the hash
>>>    is different from the monthly tarball. Faster than 1., but less
>>>    effective. Quick test: 5 mins.
>>> 3. Use some kind of VCS. My experience with Subversion and binary files
>>>    is that it's very slow.
>>> 4. Throw hardware at the problem.
>>> 
>>> I'd say it should not take more than 10 mins to recreate an archived 
>>> version. Any thoughts?
>> It seems like you should be able to combine 1 and 2 with checksums to
>> decide if you need to run diffs.  I'd think that would be quite fast.
> 
> I finally got around to testing this, and with a combination of mtree 
> comparing md5 hashes, bsdiff compacting changed files and hardlinking 
> unchanged files I get a reduction in size from 256MB to 10MB. Pretty good, 
> and the whole operation only takes a few minutes.

Cool!

> I have one peculiarity, though. I install python2.5 into the directory 
> containing the build, and even though the python version has not changed, I 
> still get mismatching md5 sums on every .pyo and .pyc file. Any thoughts on 
> this?

I'm not a python guru by any means, but I think .pyc files probably have data
about the .py they are generated from because there's some sort of
auto-generation available.  It may be possible to not store them at all and
just generate them before you use them or add some magic build flags to cause
them to store some sort of cooked values.  I'm not sure where the .pyo files
come from.

-- Brooks

Attachment: pgpEc7RZRpBxX.pgp
Description: PGP signature

Reply via email to