On Sunday, August 19, 2012 2:15:46 PM UTC-7, Bram Neijt wrote: > > A single page export will not work, for sure, but as for that I was > thinking about moving data out of dynmirror to mintiply. > > For example, if you don't want to download the complete file before > you have a metalink, you could check at > http://www.dynmirror.net/metalink/?url=http://example.com > to see if dynmirror has any metalink information. You could use > dynmirror as a kind of caching backend for downloads. > > Another thing I could do is have dynmirror redirect to mintiply if > there is no hash information available, maybe that would be a good > approach... > > I'm not really sure it would add anything, but technically it should > be possible and I think it might be good to get some code commits on > dynmirror anyway ;) >
That sounds like a good idea. Please let me know if there's anything I can do to help with this Cheers Greets, > > Bram > > > On Sun, Aug 19, 2012 at 9:58 AM, Jack Bates <[email protected]<javascript:>> > wrote: > > On Thursday, August 16, 2012 10:44:19 PM UTC-7, Jack Bates wrote: > >> > >> On Tuesday, August 14, 2012 1:58:22 PM UTC-7, Bram Neijt wrote: > >>> > >>> Hi Jack, > >>> > >>> I once created a similair thing, but it required the "owner" of the > >>> file to host the MD5 he/she thinks it should be. It then generates a > >>> metalink based on all the md5/sha1/sha256 hashes in the database. > >>> > >>> The idea is that anybody can step up and start a mirror by hosting the > >>> files and the MD5SUMS and have the service spider the MD5SUMS file. > >>> > >>> You can find the service at: http://www.dynmirror.net/ > >> > >> > >> Cool! The design of this site is impressive. I like how it shows > >> analytics, like recent downloads, on the front page > >> > >>> It might be a good idea to join up the databases or do some > >>> collaboration somewhere. Let's see what we can do. For instance, I > >>> could add a mintiply url collection or something like that? Or maybe I > >>> could have dynmirror register the hash/link combinations at mintiply? > >> > >> > >> Great idea, thanks for suggesting it. The first thing that comes to > mind > >> is, how would you like to get data out of Mintiply (and into > Dynmirror)? Is > >> there an API that Mintiply could provide that would make this as easy > as > >> possible? > > > > > > Hi Bram and thanks again for inviting me to collaborate, > > > > As an experiment, I just added a page to export all of the data from > > Mintiply, in Metalink format. Let me know what you think. Could this be > > useful to a project like Dynmirror? or would you prefer a different > format, > > or different data? > > > > There isn't much data in the app yet, so dumping everything in one > Metalink > > response works fine. If the amount of data ever gets large, we may need > to > > rethink this > > > > Here is the page: http://mintiply.appspot.com/export > > > >>> Let me know what you think. Currently, I think I'm the only user of > >>> dynmirror.net (at http://www.logfish.net/pr/ccbuild/downloads/ ). > >>> > >>> I'd also be happy to dig up and publish the code somewhere if I havn't > >>> already. > >>> > >>> Greets, > >>> > >>> Bram > >> > >> > >> Thanks very much for inviting me to collaborate > >> > >>> On Tue, Aug 14, 2012 at 8:30 AM, Jack Bates <[email protected]> > wrote: > >>> > Hi, what do you think about a Google App Engine app that generates > >>> > Metalinks > >>> > for URLs? Maybe something like this already exists? > >>> > > >>> > The first time you visit, e.g. > >>> > > >>> > > http://mintiply.appspot.com/http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2 > > >>> > it downloads the content and computes a digest. App Engine has > *lots* > >>> > of > >>> > bandwidth, so this is snappy. Then it sends a response with "Digest: > >>> > SHA-256=..." and "Location: ..." headers, similar to MirrorBrain > >>> > > >>> > It also records the digest with Google's Datastore, so on subsequent > >>> > visits, > >>> > it doesn't download or recompute the digest > >>> > > >>> > Finally, it also checks the Datastore for other URLs with matching > >>> > digest, > >>> > and sends "Link: <...>; rel=duplicate" headers for each of these. So > if > >>> > you > >>> > visit, e.g. > >>> > > >>> > > http://mintiply.appspot.com/http://mirror.nexcess.net/apache/trafficserver/trafficserver-3.2.0.tar.bz2 > > >>> > it sends "Link: > >>> > <http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2>; > > >>> > rel=duplicate" > >>> > > >>> > The idea is that this could be useful for sites that don't yet > generate > >>> > Metalinks, like SourceForge. You could always prefix a URL that you > >>> > pass to > >>> > a Metalink client with "http://mintiply.appspot.com/" to get a > >>> > Metalink. > >>> > Alternatively, if a Metalink client noticed that it was downloading > a > >>> > large > >>> > file without mirror or hash metadata, it could try to get more > mirrors > >>> > from > >>> > this app, while it continued downloading the file. As long as > someone > >>> > else > >>> > had previously tried the same URL, or App Engine can download the > file > >>> > faster than the client, then it should get more mirrors in time to > help > >>> > finish the download. Popular downloads should have the most complete > >>> > list of > >>> > mirrors, since these URLs should have been tried the most > >>> > > >>> > Right now it only downloads a URL once, and remembers the digest > >>> > forever, > >>> > which assumes that the content at the URL never changes. This is > true > >>> > for > >>> > many downloads, but in future it could respect cache control headers > >>> > > >>> > Also right now it only generates HTTP Metalinks with a whole file > >>> > digest. > >>> > But in future it could conceivably generate XML Metalinks with > partial > >>> > digests > >>> > > >>> > A major limitation with this proof of concept is that I ran into > some > >>> > App > >>> > Engine errors with downloads of any significant size, like Ubuntu > ISOs. > >>> > The > >>> > App Engine maximum response size is 32 MB. The app overcomes this > with > >>> > byte > >>> > ranges and downloading files in 32 MB segments. This works on my > local > >>> > machine with the App Engine dev server, but in production Google > >>> > apparently > >>> > kills the process after downloading just a few segments, because it > >>> > uses too > >>> > much memory. This seems wrong, since the app throws away each > segment > >>> > after > >>> > adding it to the digest. So if it has enough memory to download one > >>> > segment, > >>> > it shouldn't require any more memory for additional segments. Maybe > >>> > this > >>> > could be worked around by manually calling the Python garbage > >>> > collector, or > >>> > by shrinking the segment size... > >>> > > >>> > Also I ran into a second bug with App Engine URL Fetch and downloads > of > >>> > any > >>> > significant size: > >>> > http://code.google.com/p/googleappengine/issues/detail?id=7732#c6 > >>> > > >>> > Another thought is whether any web crawlers already maintain a > database > >>> > of > >>> > digests that an app like this could exploit? > >>> > > >>> > Here is the codes: > >>> > https://github.com/jablko/mintiply/blob/master/mintiply.py > >>> > > >>> > What are your thoughts? Maybe something like this already exists, or > >>> > was > >>> > already tried in the past... > >>> > > >>> > -- > >>> > You received this message because you are subscribed to the Google > >>> > Groups > >>> > "Metalink Discussion" group. > >>> > To view this discussion on the web visit > >>> > https://groups.google.com/d/msg/metalink-discussion/-/r7cq8sL0LuMJ. > >>> > To post to this group, send email to [email protected]. > >>> > To unsubscribe from this group, send email to > >>> > [email protected] <javascript:>. > >>> > For more options, visit this group at > >>> > http://groups.google.com/group/metalink-discussion?hl=en. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Metalink Discussion" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/metalink-discussion/-/nQSS5zOJRrgJ. > > > > To post to this group, send email to > > [email protected]<javascript:>. > > > To unsubscribe from this group, send email to > > [email protected] <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/metalink-discussion?hl=en. > -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To view this discussion on the web visit https://groups.google.com/d/msg/metalink-discussion/-/zkL9SJJaRssJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
