Re: Generate Metalinks with Google App Engine

Jack Bates Thu, 16 Aug 2012 22:44:23 -0700


On Tuesday, August 14, 2012 1:58:22 PM UTC-7, Bram Neijt wrote:
>
> Hi Jack, 
>
> I once created a similair thing, but it required the "owner" of the 
> file to host the MD5 he/she thinks it should be. It then generates a 
> metalink based on all the md5/sha1/sha256 hashes in the database. 
>
> The idea is that anybody can step up and start a mirror by hosting the 
> files and the MD5SUMS and have the service spider the MD5SUMS file. 
>
> You can find the service at: http://www.dynmirror.net/ 
>


Cool! The design of this site is impressive. I like how it shows analytics, 
like recent downloads, on the front page

It might be a good idea to join up the databases or do some 
> collaboration somewhere. Let's see what we can do. For instance, I 
> could add a mintiply url collection or something like that? Or maybe I 
> could have dynmirror register the hash/link combinations at mintiply? 
>

Great idea, thanks for suggesting it. The first thing that comes to mind 
is, how would you like to get data out of Mintiply (and into Dynmirror)? Is 
there an API that Mintiply could provide that would make this as easy as 
possible?

Let me know what you think. Currently, I think I'm the only user of 
> dynmirror.net (at http://www.logfish.net/pr/ccbuild/downloads/ ). 
>
> I'd also be happy to dig up and publish the code somewhere if I havn't 
> already. 
>
> Greets, 
>
> Bram 
>

Thanks very much for inviting me to collaborate

On Tue, Aug 14, 2012 at 8:30 AM, Jack Bates <[email protected]<javascript:>> 
> wrote: 
> > Hi, what do you think about a Google App Engine app that generates 
> Metalinks 
> > for URLs? Maybe something like this already exists? 
> > 
> > The first time you visit, e.g. 
> > 
> http://mintiply.appspot.com/http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2
>  
> > it downloads the content and computes a digest. App Engine has *lots* of 
> > bandwidth, so this is snappy. Then it sends a response with "Digest: 
> > SHA-256=..." and "Location: ..." headers, similar to MirrorBrain 
> > 
> > It also records the digest with Google's Datastore, so on subsequent 
> visits, 
> > it doesn't download or recompute the digest 
> > 
> > Finally, it also checks the Datastore for other URLs with matching 
> digest, 
> > and sends "Link: <...>; rel=duplicate" headers for each of these. So if 
> you 
> > visit, e.g. 
> > 
> http://mintiply.appspot.com/http://mirror.nexcess.net/apache/trafficserver/trafficserver-3.2.0.tar.bz2
>  
> > it sends "Link: 
> > <http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2>; 
> > rel=duplicate" 
> > 
> > The idea is that this could be useful for sites that don't yet generate 
> > Metalinks, like SourceForge. You could always prefix a URL that you pass 
> to 
> > a Metalink client with "http://mintiply.appspot.com/"; to get a 
> Metalink. 
> > Alternatively, if a Metalink client noticed that it was downloading a 
> large 
> > file without mirror or hash metadata, it could try to get more mirrors 
> from 
> > this app, while it continued downloading the file. As long as someone 
> else 
> > had previously tried the same URL, or App Engine can download the file 
> > faster than the client, then it should get more mirrors in time to help 
> > finish the download. Popular downloads should have the most complete 
> list of 
> > mirrors, since these URLs should have been tried the most 
> > 
> > Right now it only downloads a URL once, and remembers the digest 
> forever, 
> > which assumes that the content at the URL never changes. This is true 
> for 
> > many downloads, but in future it could respect cache control headers 
> > 
> > Also right now it only generates HTTP Metalinks with a whole file 
> digest. 
> > But in future it could conceivably generate XML Metalinks with partial 
> > digests 
> > 
> > A major limitation with this proof of concept is that I ran into some 
> App 
> > Engine errors with downloads of any significant size, like Ubuntu ISOs. 
> The 
> > App Engine maximum response size is 32 MB. The app overcomes this with 
> byte 
> > ranges and downloading files in 32 MB segments. This works on my local 
> > machine with the App Engine dev server, but in production Google 
> apparently 
> > kills the process after downloading just a few segments, because it uses 
> too 
> > much memory. This seems wrong, since the app throws away each segment 
> after 
> > adding it to the digest. So if it has enough memory to download one 
> segment, 
> > it shouldn't require any more memory for additional segments. Maybe this 
> > could be worked around by manually calling the Python garbage collector, 
> or 
> > by shrinking the segment size... 
> > 
> > Also I ran into a second bug with App Engine URL Fetch and downloads of 
> any 
> > significant size: 
> > http://code.google.com/p/googleappengine/issues/detail?id=7732#c6 
> > 
> > Another thought is whether any web crawlers already maintain a database 
> of 
> > digests that an app like this could exploit? 
> > 
> > Here is the codes: 
> > https://github.com/jablko/mintiply/blob/master/mintiply.py 
> > 
> > What are your thoughts? Maybe something like this already exists, or was 
> > already tried in the past... 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Metalink Discussion" group. 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msg/metalink-discussion/-/r7cq8sL0LuMJ. 
> > To post to this group, send email to 
> > [email protected]<javascript:>. 
>
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/metalink-discussion?hl=en. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metalink Discussion" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/metalink-discussion/-/f3tIWNdiy2kJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/metalink-discussion?hl=en.

Re: Generate Metalinks with Google App Engine

Reply via email to