mod_perl memory

2010-03-15 Thread Pavel Georgiev
Hi,

I have a perl script running in mod_perl that needs to write a large amount of 
data to the client, possibly over a long period. The behavior that I observe is 
that once I print and flush something, the buffer memory is not reclaimed even 
though I rflush (I know this cant be reclaimed back by the OS).

Is that how mod_perl operates and is there a way that I can force it to 
periodically free the buffer memory, so that I can use that for new buffers 
instead of taking more from the OS?

Re: mod_perl memory

2010-03-16 Thread Pavel Georgiev
Thank you both for the quick replies!

Arthur,

Apache2::SizeLimit is no solution for my problem as I'm looking for a way to 
limit the size each requests take, the fact that I can scrub the process after 
the request is done (or drop the requests if the process reaches some limit, 
although my understanding is that Apache2::SizeLimit does its job after the 
requests is done) does not help me.

William,
Let me make I'm understanding this right - I'm not using any buffers myself, 
all I do is sysread() from a unix socked and print(), its just that I need to 
print a large amount of data for each request. Are you saying that there is no 
way to free the memory after I've done print() and rflush()?

BTW thanks for the other suggestions, switching to cgi seems like the only 
reasonable thing for me, I just want to make sure that this is how mod_perl 
operates and it is not me who is doing something wrong.

Thanks,
Pavel

On Mar 16, 2010, at 11:18 AM, ARTHUR GOLDBERG wrote:

> You could use Apache2::SizeLimit ("because size does matter") which evaluates 
> the size of Apache httpd processes when they complete HTTP Requests, and 
> kills those that grow too large. (Note that Apache2::SizeLimit can only be 
> used for non-threaded MPMs, such as prefork.) Since it operates at the end of 
> a Request, SizeLimit has the advantage that it doesn't interrupt Request 
> processing and the disadvantage that it won't prevent a process from becoming 
> oversized while processing a Request. To reduce the regular load of 
> Apache2::SizeLimit it can be configured to check the size intermittently by 
> setting the parameter CHECK_EVERY_N_REQUESTS. These parameters can be 
> configured in a  section in httpd.conf, or a Perl start-up file.
> 
> That way, if your script allocates too much memory the process will be killed 
> when it finishes handling the request. The MPM will eventually start another 
> process if necessary.
> 
> BR
> A
> 
> On Mar 16, 2010, at 9:30 AM, William T wrote:
> 
>> On Mon, Mar 15, 2010 at 11:26 PM, Pavel Georgiev  wrote:
>>> I have a perl script running in mod_perl that needs to write a large amount 
>>> of data to the client, possibly over a long period. The behavior that I 
>>> observe is that once I print and flush something, the buffer memory is not 
>>> reclaimed even though I rflush (I know this cant be reclaimed back by the 
>>> OS).
>>> 
>>> Is that how mod_perl operates and is there a way that I can force it to 
>>> periodically free the buffer memory, so that I can use that for new buffers 
>>> instead of taking more from the OS?
>> 
>> That is how Perl operates.  Mod_Perl is just Perl embedded in the
>> Apache Process.
>> 
>> You have a few options:
>>  * Buy more memory. :)
>>  * Delegate resource intensive work to a different process (I would
>> NOT suggest a forking a child in Apache).
>>  * Tie the buffer to a file on disk, or db object, that can be
>> explicitly reclaimed
>>  * Create a buffer object of a fixed size and loop.
>>  * Use compression on the data stream that you read into a buffer.
>> 
>> You could also architect your system to mitigate resource usage if the
>> large data serve is not a common operation:
>>  * Proxy those requests to a different server which is optimized to
>> handle large data serves.
>>  * Execute the large data serves with CGI rather than Mod_Perl.
>> 
>> I'm sure there are probably other options as well.
>> 
>> -wjt
>> 
> 
> Arthur P. Goldberg, PhD
> 
> Research Scientist in Bioinformatics
> Plant Systems Biology Laboratory
> www.virtualplant.org
> 
> Visiting Academic
> Computer Science Department
> Courant Institute of Mathematical Sciences
> www.cs.nyu.edu/artg
> 
> a...@cs.nyu.edu
> New York University
> 212 995-4918
> Coruzzi Lab
> 8th Floor Silver Building
> 1009 Silver Center
> 100 Washington Sq East
> New York NY 10003-6688
> 
> 
> 



Re: mod_perl memory

2010-03-16 Thread Pavel Georgiev
Andre,

That is what I'm currently doing:
$request->content_type("multipart/x-mixed-replace;boundary=\"$this->{boundary}\";");

and then each chuck of prints looks like this (no length specified):

for () {
$request->print("--$this->{boundary}\n");
$request->print("Content-type: text/html; charset=utf-8;\n\n");
$request->print("$data\n\n");
$request->rflush;
}

And the result is endless memory growth in the apache process. Is that what you 
had in mind?

On Mar 16, 2010, at 12:50 PM, André Warnier wrote:

> Pavel Georgiev wrote:
> ...
>> Let me make I'm understanding this right - I'm not using any buffers myself,
>  all I do is sysread() from a unix socked and print(),
>  its just that I need to print a large amount of data for each request.
>> 
> ...
> Taking the issue at the source : can you not arrange to sysread() and/or 
> print() in smaller chunks ?
> There exists something in HTTP named "chunked response encoding" 
> (forgive me for not remembering the precise technical name).  It 
> consists of sending the response to the browser without an overall 
> Content-Length response header, but indicating that the response is 
> chunked.  Then each "chunk" is sent with its own length, and the 
> sequence ends with (if I remember correctly) a last chunk of size zero.
> The browser receives each chunk in turn, and re-assembles them.
> I have never had the problem myself, so I never looked deeply into it.
> But it just seems to me that before going off in more complicated 
> solutions, it might be worth investigating.
> 
> 



Re: mod_perl memory

2010-03-17 Thread Pavel Georgiev

On Mar 17, 2010, at 11:27 AM, Torsten Förtsch wrote:

> On Wednesday 17 March 2010 12:15:15 Torsten Förtsch wrote:
>> On Tuesday 16 March 2010 21:09:33 Pavel Georgiev wrote:
>>> for () {
>>>$request->print("--$this->{boundary}\n");
>>>$request->print("Content-type: text/html; charset=utf-8;\n\n");
>>>$request->print("$data\n\n");
>>>$request->rflush;
>>> }
>>> 
>>> And the result is endless memory growth in the apache process. Is that
>>> what you had in mind?
>>> 
>> 
>> I can confirm this. I have tried this little handler:
>> 
>> sub {
>>  my $r=shift;
>> 
>>  until( -e "/tmp/stop" ) {
>>$r->print(("x"x70)."\n");
>>$r->rflush;
>>  }
>> 
>>  return Apache2::Const::OK;
>> }
>> 
>> The httpd process grows slowly but unlimited. Without the rflush() it
>> grows  slower but still does.
>> 
> Here is a bit more stuff on the bug. It is the pool that grows.
> 
> To show it I use a handler that prints an empty document. I think an empty 
> file shipped by the default handler will do as well.
> 
> Then I add the following filter to the request:
> 
> $r->add_output_filter(sub {
>  my ($f, $bb)=...@_;
> 
>  unless( $f->ctx ) {
>$f->r->headers_out->unset('Content-Length');
>$f->ctx(1);
>  }
> 
>  my $eos=0;
>  while( my $b=$bb->first ) {
>$eos++ if( $b->is_eos );
>$b->delete;
>  }
>  return 0 unless $eos;
> 
>  my $ba=$f->c->bucket_alloc;
>  until( -e '/tmp/stop' ) {
>my $bb2=APR::Brigade->new($f->c->pool, $ba);
>$bb2->insert_tail(APR::Bucket->new($ba, ("x"x70)."\n"));
>$bb2->insert_tail(APR::Bucket::flush_create $ba);
>$f->next->pass_brigade($bb2);
>  }
> 
>  my $bb2=APR::Brigade->new($f->c->pool, $ba);
>  $bb2->insert_tail(APR::Bucket::eos_create $ba);
>  $f->next->pass_brigade($bb2);
> 
>  return 0;
> });
> 
> The filter drops the empty document and emulates our infinite output. With 
> this filter the httpd process still grows. Now I add a subpool to the loop:
> 
> [...]
>  until( -e '/tmp/stop' ) {
>my $pool=$f->c->pool->new;   # create a subpool
>my $bb2=APR::Brigade->new($pool, $ba);   # use the subpool
>$bb2->insert_tail(APR::Bucket->new($ba, ("x"x70)."\n"));
>$bb2->insert_tail(APR::Bucket::flush_create $ba);
>$f->next->pass_brigade($bb2);
>$pool->destroy;  # and destroy it
>  }
> [...]
> 
> Now it does not grow.
> 
> Torsten Förtsch
> 
> -- 
> Need professional modperl support? Hire me! (http://foertsch.name)
> 
> Like fantasy? http://kabatinte.net


How would that logic (adding subpools and using them) be applied to my 
simplified example:

for (;;) {
   $request->print("--$this->{boundary}\n");
   $request->print("Content-type: text/html; charset=utf-8;\n\n");
   $request->print("$data\n\n");
   $request->rflush;
}

Do I need to add an output filter?

Re: mod_perl memory

2010-03-18 Thread Pavel Georgiev
Thanks, that did the job. I'm currently testing for side effects but it all 
looks good so far.

On Mar 18, 2010, at 4:09 AM, Torsten Förtsch wrote:

> On Thursday 18 March 2010 11:54:53 Mårten Svantesson wrote:
>> I have never worked directly with the APR API but in the example above
>> couldn't you prevent the request pool from growing by explicitly reusing
>> the  bucket brigade?
>> 
>> Something like (not tested):
>> 
>> sub {
>>   my ($r)=...@_;
>> 
>>   my $ba=$r->connection->bucket_alloc;
>>   my $bb2=APR::Brigade->new($r->pool, $ba);
>>   until( -e '/tmp/stop' ) {
>> $bb2->insert_tail(APR::Bucket->new($ba, ("x"x70)."\n"));
>> $bb2->insert_tail(APR::Bucket::flush_create $ba);
>> $r->output_filters->pass_brigade($bb2);
>> $bb2->cleanup();
>>   }
>> 
>>   $bb2->insert_tail(APR::Bucket::eos_create $ba);
>>   $r->output_filters->pass_brigade($bb2);
>> 
>>   return Apache2::Const::OK;
>> }
>> 
> Thanks for pointing to the obvious. This doesn't grow either.
> 
> Torsten Förtsch
> 
> -- 
> Need professional modperl support? Hire me! (http://foertsch.name)
> 
> Like fantasy? http://kabatinte.net