Proxying downloads
Hello, more a recipe question. I'm working on a proxy that will download a file for a client. The thing that doesn't yield problems is: Alice (Client) Bob (Client) Sam (Server) 1 Alice asks Sam for foobar.iso 2 Sam can't find foobar.iso in cachedir 3 Sam requests foobar.iso from the uplink 4 Sam now saves each chunk received to cachedir/foobar.iso 5 At the same time Sam forwards each chunk to Alice. But I can't figure out how I would solve the following: 1 Alice asks Sam for foobar.iso 2 Sam can't find foobar.iso in cachedir 3 Sam requests foobar.iso from uplink 4 Sam saves and forwards to Alice 5 At about 30 % of the download Bob asks Sam for foobar.iso 6 How do I serve Bob now? Now because the internal link is _a lot_ faster than the uplink Bob will probably reach the end of (the local) foobar.iso before Sam has received foobar.iso in total from uplink. So Bob will end up with a incomplete file... How do I solve that. The already downloaded data should of course be served internally. The solutions I think of are * Some kind of subscriber list for the file in question * That is serve internally and if the state of foobar.iso is in progress switch to receiving chunk directly from Sam as it comes down the link * How would I realize this switch from internal serving to pass thru of chunks? * Send an acknowledge (lie to the client that we have this file in the cache) wait until it's finished and then serve the file from the internal cache) * This could lead to timeouts for very large files, at least I think so * Forget about all of it and just pass thru from uplink, with a new request, as long as files are in progress. This would in the worst case download the file n times where n is the number of clients. * I guess that's the easiest one but also the least desirable solution. I hope I explained my problem somehow understandable. any hints are welcome thanks martin -- http://noneisyours.marcher.name http://feeds.feedburner.com/NoneIsYours -- http://mail.python.org/mailman/listinfo/python-list
Re: Proxying downloads
You use a temp directory to store the file while downloading, then move it to the cache so the addition of the complete file is atomic. The file name of the temp file should be checked to validate that you don't overwrite another process' download. Currently downloading urls should be registered with the server process (a simple list or set would work). New requests should be checked against that; if there is a matching url in there, the process must wait until that download is finished and that file should be delivered to both Alice and Bob. You need to store the local file path and the url it was downloaded from and checking against that when a request is made; there might be two foobar.iso files on the Internet or the network, and they may be different (such as in differently versioned directories). -- http://mail.python.org/mailman/listinfo/python-list
Re: Proxying downloads
But I can't figure out how I would solve the following: 1 Alice asks Sam for foobar.iso 2 Sam can't find foobar.iso in cachedir 3 Sam requests foobar.iso from uplink 4 Sam saves and forwards to Alice 5 At about 30 % of the download Bob asks Sam for foobar.iso 6 How do I serve Bob now? Let every file in your download cache be represented by a Python object. Instead of streaming the file directly to the clients, you can stream the objects. The object will know if the file it represents has finished downloading or not, where the file is located etc. This way you can also, for the sake of persistence, keep partially downloaded files separate from the completely downloaded files, as per a previous suggestion, so that you won't start serving half files after a crash, and it'll be completely transparent in all code except for your proxy file objects. Martin -- http://mail.python.org/mailman/listinfo/python-list