On Apr 22, 2012, at 10:22, Craig Treleaven wrote:
> At 3:05 AM -0500 4/22/12, Ryan Schmidt wrote:
>> 
>> On Apr 21, 2012, at 20:30, Sean Farley wrote:
>>> Which then will use the zip / tarball download by default
>> 
>> I didn't think github had automated downloads available except for tags.
>> 
>> If github has automated downloads available for any tag/branch as well, then 
>> we would need to verify that they always have the same checksums, and are 
>> not generated on the fly. I'm pretty sure that bitbucket, for example, 
>> generates them on the fly, meaning different users requesting them at 
>> different times will get different checksums, which means they're not 
>> suitable for use as master_sites in MacPorts.
> 
> 
> I'm no Git expert, but wouldn't git archive help us?  Git archive will 
> retrieve from a remote repository and can format the result as a zip file.  
> From the man page:
>      <tree-ish>
>           The tree or commit to produce an archive for.

Yes, a port developer could use such a command to create a tarball, which would 
then by some manual process be uploaded to the distfiles.macports.org server, 
and then the portfile would be manually modified to reference that distfile 
instead of fetching from git. This is a process we have recommended before, but 
since it involves a lot of manual labor most people don't do it. It's not the 
process that's been discussed thus far in this mailing list thread.


> I don't know whether the checksums would always be the same.  As I understand 
> Git, a commit hash uniquely identifies a particular state of the repository 
> so if we specify a hash, we'll always get precisely the same result.  The 
> only exception would be, I guess, if someone has hacked the repository.  I 
> have no idea if that could be done without detection by the repository site.  
> I take it that is what we're trying to protect against?

No, I'm trying to protect against the gzip compression of the tar archive 
varying from generation to generation. gzip compression uses entropy -- random 
numbers. If you have two identical tar archives, and gzip compress them with 
the same settings, the resulting gzip files will not be byte for byte 
identical, and thus they'll have different checksums:

$ sha1sum *.tar
92bfe8b02b49b977a18c9f8e8d301a0ef159fe51  1.tar
92bfe8b02b49b977a18c9f8e8d301a0ef159fe51  2.tar
$ gzip 1.tar
$ gzip 2.tar
$ sha1sum *.tar.gz
39c6beda6851d98295f770a11b8ea122647ae4c8  1.tar.gz
7a95ea746e698d367ec155e4387972051e1a2e38  2.tar.gz
$ 

It seems unlikely that github has an infinite amount of disk space to forever 
retain any tarball of any revision of any repository that some user may only 
have requested one time and nobody will ever request again. So I would assume 
they keep this generated tarball around for a period of time, maybe 24-48 
hours, and then delete it if it hasn't been requested again.

Tarballs of tags, on the other hand, I believe they do keep forever, since it's 
reasonable to expect tarballs of tags to be downloaded often, and it's 
desirable for them to have the same checksums, so they can be verified, as 
MacPorts does.



_______________________________________________
macports-dev mailing list
macports-dev@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macports-dev

Reply via email to