Hello, for GSoC project I will do the following: 1. implement downloading one file through a mirror-list 2. implement downloading multiple files from multiple servers 3. fix Metalink support.
I'd like to get your opinions regarding implementation of the first one, although I will soon RFC for the second one aswell. 1. Single file through a mirror-list a) Backend A user would specify a number of threads N and a list of mirror servers. A flowchart would look like this: 1) Go through mirrors and find first available server (available - responds in < MAX_RETRIES retries). 2) Try to figure out file size with Content-Length header. If size is unknown fallback to a single thread download. Would it be sensible to allow user to specify file size with some switch? 3) The main thread maintains a pool of available servers. It spawns at most N threads if N < M or at most M threads if M < N, where M is number of available mirrors. Every thread downloads each own chunk from each own mirror using current implementation of concurrent download for Metalink. If some mirror becomes unavailable during download from i-th thread, that threads terminates and notifies the main thread. The main thread spawns a new thread from available mirrors; if none is available at the moment, it waits until some mirror becomes available (whenever some other thread finishes downloading its chunk). It might occur that a mirror that was unavailable becomes available during download. Such mirros should be added to the pool of available mirrors. I was thinking about creating another thread that would occasionaly "poke" unavailable servers and add them to the pool if they respond. It might occur that when M < N and therefore M threads were spawned, a fresh mirror is added to the pool (see previous paragraph). In this case it's probably best to divide file into N pieces no matter what - but only M threads will be active at the beginning. The newly added server can be used to spawn another thread. 4) A file would be downloaded to a single temporary file as described here: http://lists.gnu.org/archive/html/bug-wget/2014-05/msg00025.html I'm still fixing the patch, because at least one memory corruption bug is still lurking around which is yet to be found. b) Front end What would be a good way to specify mirror list? Specifying a switch and listing all mirrors could be quite awkward. Should we introduce some sort of a simple file format? I believe we should take into consideration number 2: downloading multiple files from multiple servers. Do we want to apply different switches (options) to different files? What about if we want to combine 1. and 2.: multiple files from multiple mirror list? The simplest way would be to use Metalink file for such purpose but is it the most elegant? All your suggestions are greatly appreciated. Best Regards, Jure Grabnar
