On Apr 22, 2012, at 15:51, Ryan Schmidt wrote:

> I'm trying to protect against the gzip compression of the tar archive varying 
> from generation to generation. gzip compression uses entropy -- random 
> numbers. If you have two identical tar archives, and gzip compress them with 
> the same settings, the resulting gzip files will not be byte for byte 
> identical, and thus they'll have different checksums:

> It seems unlikely that github has an infinite amount of disk space to forever 
> retain any tarball of any revision of any repository that some user may only 
> have requested one time and nobody will ever request again. So I would assume 
> they keep this generated tarball around for a period of time, maybe 24-48 
> hours, and then delete it if it hasn't been requested again.

On Apr 22, 2012, at 15:55, Ryan Schmidt wrote:

> On Apr 22, 2012, at 12:17, Sean Farley wrote:
>> You can see that they both generate the same checksums. For the above
>> link, the sha1sum reports:
>> 
>> $ sha1sum ~/Downloads/AndreaCrotti-yasnippet-snippets-1441728.tar.gz
>> 61df0e33e73940f720d5506520068533f8b28869
>> /Users/sean/Downloads/AndreaCrotti-yasnippet-snippets-1441728.tar.gz

> now let's wait 24-48 hours and try again and see if we still get the same 
> checksum.

I remembered that I downloaded a .tar.gz of a revision of some project from 
github in October 2011. I tried downloading the same revision now, and to my 
surprise, found both the old and the new .tar.gz archives to have the same 
checksums. So either github is using exorbitant amounts of disk space to keep 
these old archives around, or has installed a custom version of gzip whose 
random seed can be controlled or by some other means ensure that the gzip 
output of repeated runs is identical. That's good news, so I suppose we can 
indeed fix the github portgroup now to fetch distfiles even when git.branch is 
specified.

We have most definitely observed the effect I described, however, with 
bitbucket:

https://trac.macports.org/ticket/30241

https://trac.macports.org/ticket/32833

https://trac.macports.org/ticket/32791

Bitbucket archives also seem to sometimes change for reasons other than entropy:

https://trac.macports.org/ticket/27843

We could perhaps open a dialog with the people at bitbucket and see if this can 
be changed.


_______________________________________________
macports-dev mailing list
macports-dev@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macports-dev

Reply via email to