You can effectively stream a file byte by byte - you just need to print a chunk at a time and mod_perl and apache will handle it appropriately... I do this all the time to handle large data downloads (the systems I manage are backed by peta bytes of data)...

The art is often not in the output - but in the way you get and process data before sending it - I have code that will upload/download arbitrary large files (using HTML5's file objects) without using excessive amounts of memory... (all data is stored in chunks in a MySQL database)

Streaming has other advantages with large data - if you wait till you generate all the data then you will find that you often get a time out - I have a script which can take up to 2 hours to generate all the output - but it never times out as it is sending a line of data at a time.... and do data is sent every 5-10 seconds... and the memory footprint is trivial - as only data for one line of output is in memory at a time..


On 28/03/2015 16:25, John Dunlap wrote:
sendfile sounds like its exactly what I'm looking for. I see it in the API documentation for Apache2::RequestIO but how do I get a reference to it from the reference to Apache2::RequestRec which is passed to my handler?

On Sat, Mar 28, 2015 at 9:54 AM, Perrin Harkins <phark...@gmail.com <mailto:phark...@gmail.com>> wrote:

    Yeah, sendfile() is how I've done this in the past, although I was
    using mod_perl 1.x for it.

    On Sat, Mar 28, 2015 at 5:55 AM, André Warnier <a...@ice-sa.com
    <mailto:a...@ice-sa.com>> wrote:

        Randolf Richardson wrote:

                I know that it's possible(and arguably best practice)
                to use Apache to
                download large files efficiently and quickly, without
                passing them through
                mod_perl. However, the data I need to download from my
                application is both
                dynamically generated and sensitive so I cannot expose
                it to the internet
                for anonymous download via Apache. So, I'm wondering
                if mod_perl has a
                capability similar to the output stream of a java
                servlet. Specifically, I
                want to return bits and pieces of the file at a time
                over the wire so that
                I can avoid loading the entire file into memory prior
                to sending it to the
                browser. Currently, I'm loading the entire file into
                memory before sending
                it and

                Is this possible with mod_perl and, if so, how should
                I go about
                implementing it?


                    Yes, it is possible -- instead of loading the
            entire contents of a file into RAM, just read blocks in a
            loop and keep sending them until you reach EoF (End of File).

                    You can also use $r->flush along the way if you
            like, but as I understand it this isn't necessary because
            Apache HTTPd will send the data as soon as its internal
            buffers contain enough data.  Of course, if you can tune
            your block size in your loop to match Apache's output
            buffer size, then that will probably help.  (I don't know
            much about the details of Apache's output buffers because
            I've not read up too much on them, so I hope my
            assumptions about this are correct.)

                    One of the added benefits you get from using a
            loop is that you can also implement rate limiting if that
            becomes useful.  You can certainly also implement access
            controls as well by cross-checking the file being sent
            with whatever internal database queries you'd normally use
            to ensure it's okay to send the file first.


        You can also :
        1) write the data to a file
        2) $r->sendfile(...);
        3) add a cleanup handler, to delete the file when the request
        has been served.
        See here for details :
        http://perl.apache.org/docs/2.0/api/Apache2/RequestIO.html#C_sendfile_

        For this to work, there is an Apache configuration directive
        which must be set to "on". I believe it is called "UseSendFile".
        Essentially what senfile() does, is to delegate the actual
        reading and sending of the file to Apache httpd and the
        underlying OS, using code which is specifically optimised for
        this purpose. It is much kore efficient than doing this in a
        read/write loop by yourself, at the cost of having less fine
        control over the operation.





--
John Dunlap
/CTO | Lariat/
/
/
/*Direct:*/
/j...@lariat.co <mailto:j...@lariat.co>/
/
*Customer Service:*/
877.268.6667
supp...@lariat.co <mailto:supp...@lariat.co>



---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com



--
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Reply via email to