Re: Large File Download

Issac Goldstand Sat, 28 Mar 2015 12:55:26 -0700

sendfile is much more efficient than that.  At the most basic level,
sendfile allows a file to be streamed directly from the block device (or
OS cache) to the network, all in kernel-space (see sendfile(2)).


What you describe below is less effective, since you need to ask the
kernel to read the data, chunk-by-chunk, send it to userspace, and then
from userspace back to kernel space to be sent to the net.

Beyond that, the Apache output filter stack is also spending time
examining your data, possibly buffering it differently than you are (for
example to make HTTP chunked-encoding) - by using sendfile, you'll be
bypassing the output filter chain (for the request, at least;
connection/protocol filters, such as HTTPS encryption will still get in
the way, but you probably want that to happen :)) further optimizing the
output.

If you're manipulating data, you need to stream yourself, but if you
have data on the disk and can serve it as-is, sendfile will almost
always perform much, much, much better.

  Issac

On 3/28/2015 7:40 PM, Dr James Smith wrote:
> You can effectively stream a file byte by byte - you just need to print
> a chunk at a time and mod_perl and apache will handle it
> appropriately... I do this all the time to handle large data downloads
> (the systems I manage are backed by peta bytes of data)...
> 
> The art is often not in the output - but in the way you get and process
> data before sending it - I have code that will upload/download arbitrary
> large files (using HTML5's file objects) without using excessive amounts
> of memory... (all data is stored in chunks in a MySQL database)
> 
> Streaming has other advantages with large data - if you wait till you
> generate all the data then you will find that you often get a time out -
> I have a script which can take up to 2 hours to generate all the output
> - but it never times out as it is sending a line of data at a time....
> and do data is sent every 5-10 seconds... and the memory footprint is
> trivial - as only data for one line of output is in memory at a time..
> 
> 
> On 28/03/2015 16:25, John Dunlap wrote:
>> sendfile sounds like its exactly what I'm looking for. I see it in the
>> API documentation for Apache2::RequestIO but how do I get a reference
>> to it from the reference to Apache2::RequestRec which is passed to my
>> handler?
>>
>> On Sat, Mar 28, 2015 at 9:54 AM, Perrin Harkins <phark...@gmail.com
>> <mailto:phark...@gmail.com>> wrote:
>>
>>     Yeah, sendfile() is how I've done this in the past, although I was
>>     using mod_perl 1.x for it.
>>
>>     On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <a...@ice-sa.com
>>     <mailto:a...@ice-sa.com>> wrote:
>>
>>         Randolf Richardson wrote:
>>
>>                 I know that it's possible(and arguably best practice)
>>                 to use Apache to
>>                 download large files efficiently and quickly, without
>>                 passing them through
>>                 mod_perl. However, the data I need to download from my
>>                 application is both
>>                 dynamically generated and sensitive so I cannot expose
>>                 it to the internet
>>                 for anonymous download via Apache. So, I'm wondering
>>                 if mod_perl has a
>>                 capability similar to the output stream of a java
>>                 servlet. Specifically, I
>>                 want to return bits and pieces of the file at a time
>>                 over the wire so that
>>                 I can avoid loading the entire file into memory prior
>>                 to sending it to the
>>                 browser. Currently, I'm loading the entire file into
>>                 memory before sending
>>                 it and
>>
>>                 Is this possible with mod_perl and, if so, how should
>>                 I go about
>>                 implementing it?
>>
>>
>>                     Yes, it is possible -- instead of loading the
>>             entire contents of a file into RAM, just read blocks in a
>>             loop and keep sending them until you reach EoF (End of File).
>>
>>                     You can also use $r->flush along the way if you
>>             like, but as I understand it this isn't necessary because
>>             Apache HTTPd will send the data as soon as its internal
>>             buffers contain enough data.  Of course, if you can tune
>>             your block size in your loop to match Apache's output
>>             buffer size, then that will probably help.  (I don't know
>>             much about the details of Apache's output buffers because
>>             I've not read up too much on them, so I hope my
>>             assumptions about this are correct.)
>>
>>                     One of the added benefits you get from using a
>>             loop is that you can also implement rate limiting if that
>>             becomes useful.  You can certainly also implement access
>>             controls as well by cross-checking the file being sent
>>             with whatever internal database queries you'd normally use
>>             to ensure it's okay to send the file first.
>>
>>
>>         You can also :
>>         1) write the data to a file
>>         2) $r->sendfile(...);
>>         3) add a cleanup handler, to delete the file when the request
>>         has been served.
>>         See here for details :
>>         
>> http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_
>>
>>         For this to work, there is an Apache configuration directive
>>         which must be set to "on". I believe it is called "UseSendFile".
>>         Essentially what senfile() does, is to delegate the actual
>>         reading and sending of the file to Apache httpd and the
>>         underlying OS, using code which is specifically optimised for
>>         this purpose.  It is much kore efficient than doing this in a
>>         read/write loop by yourself, at the cost of having less fine
>>         control over the operation.
>>
>>
>>
>>
>>
>> -- 
>> John Dunlap
>> /CTO | Lariat /
>> /
>> /
>> /*Direct:*/
>> /j...@lariat.co <mailto:j...@lariat.co>/
>> /
>> *Customer Service:*/
>> 877.268.6667
>> supp...@lariat.co <mailto:supp...@lariat.co>
> 
> 
> 
> ------------------------------------------------------------------------
> <http://www.avast.com/>       
> 
> This email has been checked for viruses by Avast antivirus software.
> www.avast.com <http://www.avast.com/>
> 
> 
> 
> -- The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.

Re: Large File Download

Reply via email to