Meanwhile, I wrote a simple proof of concept (parallel dummy downloads using threads, dummy downloading of chunks, etc.). I am at the point where I want to implement HTTP-Header metalink (RFC 6249). I just can't find any servers to test with... maybe you can help me out ?
Well, since there is no response to my previous post: is there any interest in getting that done anyway ? Tim Am Tuesday 03 April 2012 schrieb Tim Ruehsen: > Hi Giuseppe, hi Micah, > > while couldn't sleep last night, I thought about wget and concurrency... > > I had the idea of using a top-down approach to outline what wget is doing. > Just to have a overview without struggling with the details of > implementation. As a side effect one would have a (textual? graphical?) > starting point for contributors to rush into the project. A chance to have > a clear and well documented design. > > Since maintenance of a flowchart is time-consuming and requires some extra > skills and tools, pure texts in the form of a "programming language" seems > to fit. > > Here is just a beginning, let's say a basis for discussions. > If you don't mind, I would like take part in ongoing development. > > Basic wget functionality (download given URI/IRI): > > main (URI) { > put <URI> into <queue> > > while <queue> is not empty { > download_and_analyse(next <queue> entry) > } > } > > download_and_analyse (URI) { > download URI to FILE > add URI to <downloaded> > remove URI from <queue> > scan FILE and add URIs to <queue> if not already in <downloaded> > } > > > Extended for simple multitasking (threaded, multi processes or even > distributed). > This is just one possible design for concurrent downloads. > Maybe you have a more elegant idea. > > main (URI) { > create <N> downloaders > put <URI> into <queue> > > wait for status message from downloader { > print status > if <queue> is empty { > stop downloaders > we are done > } > } > } > > downloader { > wait for and allocate entry in <queue> { > download_and_analyse(entry) > } > } > > download_and_analyse (URI) { > download URI to FILE > add URI to <downloaded> > remove URI from <queue> > scan FILE and add URIs to <queue> if not already in <downloaded> > } > > > Extended to download a URI from several sources in parallel. > main and downloader stay the same, just download_and_analyse() is extended. > > download_and_analyse (URI) { > /* download URI to FILE */ > put <X> chunk entries into <chunk_queue> > create <X> chunkloaders > wait for status message from chunkloader { > send modified status message to main > if <chunk_queue> is empty { > stop chunk_loaders > end loop > } > } > > add URI to <downloaded> > remove URI from <queue> > scan FILE and add URIs to <queue> if not already in <downloaded> > } > > chunk_loader { > wait for and allocate entry in <chunk_queue> { > download(entry) > remove entry from <chunk_queue> > } > } > > After some iterations we should come to a point where we can make further > decisions: > - how to implement concurrency (threads, processes, distributed process, > (cloud)) > - how to implement communication between tasks > - is a wget rewrite reasonable ? > - which existing code to recycle ? > - creating libraries from existing code (e.g. libwget) or use external > libraries > (e.g. for network stuff, parsing and creating URI/IRIs, etc.) > - create a list of test code, especially for the library code > - ... etc etc ... > > > Tim