Proxying downloads

2007-10-30 Thread Martin Marcher
Hello,

more a recipe question. I'm working on a proxy that will download a
file for a client. The thing that doesn't yield problems is:

Alice (Client)
Bob (Client)
Sam (Server)

1 Alice asks Sam for foobar.iso
2 Sam can't find foobar.iso in cachedir
3 Sam requests foobar.iso from the uplink
4 Sam now saves each chunk received to cachedir/foobar.iso
5 At the same time Sam forwards each chunk to Alice.

But I can't figure out how I would solve the following:

1 Alice asks Sam for foobar.iso
2 Sam can't find foobar.iso in cachedir
3 Sam requests foobar.iso from uplink
4 Sam saves and forwards to Alice
5 At about 30 % of the download Bob asks Sam for foobar.iso
6 How do I serve Bob now?

Now because the internal link is _a lot_ faster than the uplink Bob
will probably reach the end of (the local) foobar.iso before Sam has
received foobar.iso in total from uplink. So Bob will end up with a
incomplete file...

How do I solve that. The already downloaded data should of course be
served internally.

The solutions I think of are
 * Some kind of subscriber list for the file in question
  * That is serve internally and if the state of foobar.iso is in
progress switch to receiving chunk directly from Sam as it comes down
the link
  * How would I realize this switch from internal serving to pass thru
of chunks?

 * Send an acknowledge (lie to the client that we have this file in
the cache) wait until it's finished and then serve the file from the
internal cache)
  * This could lead to timeouts for very large files, at least I think so

 * Forget about all of it and just pass thru from uplink, with a new
request, as long as files are in progress. This would in the worst
case download the file n times where n is the number of clients.
  * I guess that's the easiest one but also the least desirable solution.

I hope I explained my problem somehow understandable.

any hints are welcome
thanks
martin

-- 
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Proxying downloads

2007-10-30 Thread Jeff
You use a temp directory to store the file while downloading, then
move it to the cache so the addition of the complete file is atomic.
The file name of the temp file should be checked to validate that you
don't overwrite another process' download.

Currently downloading urls should be registered with the server
process (a simple list or set would work).  New requests should be
checked against that; if there is a matching url in there, the process
must wait until that download is finished and that file should be
delivered to both Alice and Bob.

You need to store the local file path and the url it was downloaded
from and checking against that when a request is made; there might be
two foobar.iso files on the Internet or the network, and they may be
different (such as in differently versioned directories).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Proxying downloads

2007-10-30 Thread Martin Sand Christensen
 But I can't figure out how I would solve the following:

 1 Alice asks Sam for foobar.iso
 2 Sam can't find foobar.iso in cachedir
 3 Sam requests foobar.iso from uplink
 4 Sam saves and forwards to Alice
 5 At about 30 % of the download Bob asks Sam for foobar.iso
 6 How do I serve Bob now?

Let every file in your download cache be represented by a Python object.
Instead of streaming the file directly to the clients, you can stream
the objects. The object will know if the file it represents has finished
downloading or not, where the file is located etc. This way you can
also, for the sake of persistence, keep partially downloaded files
separate from the completely downloaded files, as per a previous
suggestion, so that you won't start serving half files after a crash,
and it'll be completely transparent in all code except for your proxy
file objects.

Martin
-- 
http://mail.python.org/mailman/listinfo/python-list