Re: close connection for request, but continue

2016-04-21 Thread Frotz Fa'Atuai (ffaatuai)
Iosif:
You will need:

[] a background state storage location (database table with unique row ID;
directory with unique ID which points to the state file).
[] Your user-facing request page accepts the request, scheduled the work,
responds with a page which auto-refreshes against the GUID which reports
status of the background request.
[] Your user-facing status page auto-refreshes to itself while the job is
in motion.
[] Your user-facing status page auto-refreshes to a "you're done; your
results are here / have been mailed to you".

Observations:
[] My corporate business users can follow along with a 15 second
auto-refresh (as long as the page clearly indicates an auto-refresh in 15
seconds).  Count-down timers are probably better.
[] My technical users close the pop-up tab after the first request (not
caring for the intermediate status pages and knowing that the result has
been accomplished or mailed to them).

Asides:
[] Some of our backend jobs take a long time (lots of data to grind
through); these tend towards email status.
[] The database-based queue view (assuming you're internally facing only)
allows your support teams to observed queued jobs (things which will
happen in the future), active jobs (things running right now on some
machine), completed jobs (jobs which succeeded), failed jobs (jobs which
did not succeed).

Hopefully these implementation specifics and operational observations
assist you as you take André's excellent summary and put it all to work.
--
Frotz
EMAN
Cisco Systems, Inc.

On 2016/04/21, 07:36, "André Warnier (tomcat)"  wrote:

>On 21.04.2016 11:20, Iosif Fettich wrote:
>> Dear mod_perl list,
>>
>> please consider my gratefulness for any hints/insight :)
>>
>> I'm trying to achieve the following: when there is an incoming request,
>>I want to set a
>> time limit in which an answer should be delivered to the client, no
>>matter what.
>>
>> However, since the work triggered by the initial request (there is
>>another request to
>> other site involved)  might take much longer than that time limit, I
>>want that work to
>> properly finish, despite the fact that the initial request was 'served'
>>already.
>>
>
>The "canonical" way to do this, would be something like
>- the client sends the request to the server
>- the server allocates a process (or thread or whatever) to process this
>request
>- this request-processing process "delegates" this browser request to
>some other, 
>independent-of-the-webserver process, which can take as long as necessary
>to fulfill the 
>(background part of) the request
>- the request-processing process does not wait for the response or the
>exit of that 
>independent process, but returns a response right away to the client
>browser (such as 
>"Thank you for your request. It is being handled by our back-office. You
>will receive an 
>email when it's done.".)
>- and then, as far as the webserver is concerned, this client request is
>finished 
>(cleanly), and the request-processing process can be re-allocated to some
>other incoming 
>request
>
>Optionally, you could provide a way for the client to periodically
>enquire as to the 
>advancement status of his request.



Re: close connection for request, but continue

2016-04-21 Thread tomcat

On 21.04.2016 11:20, Iosif Fettich wrote:

Dear mod_perl list,

please consider my gratefulness for any hints/insight :)

I'm trying to achieve the following: when there is an incoming request, I want 
to set a
time limit in which an answer should be delivered to the client, no matter what.

However, since the work triggered by the initial request (there is another 
request to
other site involved)  might take much longer than that time limit, I want that 
work to
properly finish, despite the fact that the initial request was 'served' already.



[...]

In agreement with Perrin, and to expand a bit :

To go back some 20 years, let's say that the original design of HTTP and webservers was 
not really thought for client requests that take a long time to process.
Browsers, when they make a request to a server, will wait for a response for a maximum of 
about 5 minutes, and if by then they have not received a response, will close the 
connection and display an error like "this server appears to be busy, and does not respond"..
And since the connection is now closed, whenever in the end the server would try to send 
back a response, it would find no connection to send it on, and it would abort the request 
processing at that point and write some error message to the error log.


But you seem to already know all that, which is probably why you are sending a response to 
the browser no matter what, before this timeout occurs.


However, the way in which you are doing this (currently), is kind of a "perversion" of the 
protocol, because
- you are sending a response to the browser saying that everything is ok (so for the 
browser this request is terminated and it can go on with the next one (and/or close the 
connection))
- but on the other hand, the request-processing process under Apache is still running, for 
this request and this client.
And if that request-processing process now, for whatever reason, would have something to 
send to the client (for example, some error), it would find the connection gone and be 
unable to do it.


(And because what you are doing is in fact not a natural thing to do, is the reason why 
you are not finding any standard module or interface or API to do that kind of thing)


The "canonical" way to do this, would be something like
- the client sends the request to the server
- the server allocates a process (or thread or whatever) to process this request
- this request-processing process "delegates" this browser request to some other, 
independent-of-the-webserver process, which can take as long as necessary to fulfill the 
(background part of) the request
- the request-processing process does not wait for the response or the exit of that 
independent process, but returns a response right away to the client browser (such as 
"Thank you for your request. It is being handled by our back-office. You will receive an 
email when it's done.".)
- and then, as far as the webserver is concerned, this client request is finished 
(cleanly), and the request-processing process can be re-allocated to some other incoming 
request


Optionally, you could provide a way for the client to periodically enquire as to the 
advancement status of his request.


The tricky bit, is to have the Apache-request-processing process in which you 
are originally,

- either itself start a totally independent secondary process that will go off and fulfill 
the long-running part of the request. Tricky to do right, easy to overwhelm your server.


- or (probably simpler), just pass this request to an already-running independent server 
process which will do this long-running part.

This is what Perrin refers to as a "job queue" system.
You can develop such a "job queue" system yourself, or you can use an already-made one. 
There are such things within the Apache projects, or if you want perl, you may find some 
under CPAN (see POE for example).


I would guess that this is all a bit more complicated than what you envisioned initially, 
but that's the case of many such things.




Re: close connection for request, but continue

2016-04-21 Thread James Smith
A job queue is also better because it stops un-controlled forking or 
excessive numbers of "dead" web connections hanging around. It will just 
queue requests until resources are available.. You may find handling 
multiple of these jobs in parallel eats up all your processor/memory 
resources.. Where queuing you can limit the number of process running in 
parallel you have. (and if your site gets bigger you may be able to hand 
off some of this to a cluster of machines to handle the long running 
process)



On 4/21/2016 3:25 PM, Perrin Harkins wrote:
On Thu, Apr 21, 2016 at 9:48 AM, Iosif Fettich > wrote:


I'm afraid that won't fit, actually. It's not a typical Cleanup
I'm after - I actually want to not abandon the request I've
started, just for closing the incoming original request. The
cleanup handler could relaunch the slow back request - but doing
so I'd pay twice for it.


You don't have to. You can just return immediately, and do all the 
work in the cleanup (or a job queue) while you let the client poll for 
status. It's a little extra work for simple requests, but it means all 
requests are handled the same and you never make extra requests to 
your expensive backend.


If you're determined not to do polling from the client, your best bet 
is probably to fork immediately and do the work in the fork, while you 
poll to check if it's done in your original process. You'd have to 
write the response to a database or something that the original 
process can pick it up from. But forking from mod_perl is a pain and 
easy to mess up, so I recommend doing one of the other approaches.


- Perrin





--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 

Re: close connection for request, but continue

2016-04-21 Thread Perrin Harkins
On Thu, Apr 21, 2016 at 9:48 AM, Iosif Fettich  wrote:

> I'm afraid that won't fit, actually. It's not a typical Cleanup I'm after
> - I actually want to not abandon the request I've started, just for closing
> the incoming original request. The cleanup handler could relaunch the slow
> back request - but doing so I'd pay twice for it.


You don't have to. You can just return immediately, and do all the work in
the cleanup (or a job queue) while you let the client poll for status. It's
a little extra work for simple requests, but it means all requests are
handled the same and you never make extra requests to your expensive
backend.

If you're determined not to do polling from the client, your best bet is
probably to fork immediately and do the work in the fork, while you poll to
check if it's done in your original process. You'd have to write the
response to a database or something that the original process can pick it
up from. But forking from mod_perl is a pain and easy to mess up, so I
recommend doing one of the other approaches.

- Perrin


Re: close connection for request, but continue

2016-04-21 Thread Iosif Fettich

Hi Perrin,


I'm trying to achieve the following: when there is an incoming request, I
want to set a time limit in which an answer should be delivered to the
client, no matter what.

However, since the work triggered by the initial request (there is another
request to other site involved)  might take much longer than that time
limit, I want that work to properly finish, despite the fact that the
initial request was 'served' already.



TMTOWTDI, but the common way to do this is to add the long-running job to a
job queue, and then redirect the user to a page that periodically checks if
the job is done by using JavaScript requests.


It's not such a typical long-running job that I'm doing. It rather goes 
like this: whereas I most of the time can answer with what I have within 
the acceptable answer time, I sometimes have to make another request in 
the background. That too most of the time is served within acceptable 
time; _sometimes_ it isn't, so only occasionally it takes more.


The clue: let's say the backend service is pay-per-use, so I definitely 
don't want to throw away a started request. If I have launched a request 
in the back, I'd want to get the results, even if the initial requester 
was turned down in the meantime.



If you don't have a job queue and don't want to add one just for this, you
could use a cleanup handler to run the slow stuff after disconnecting:
http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlCleanupHandler


I'm afraid that won't fit, actually. It's not a typical Cleanup I'm after 
- I actually want to not abandon the request I've started, just for 
closing the incoming original request. The cleanup handler could relaunch 
the slow back request - but doing so I'd pay twice for it.



That will tie up a mod_perl process though, so it's not a good way to go
for large sites.


I'm aware of that, but that's less of a concern for now.

Many thanks,

Iosif Fettich


Re: close connection for request, but continue

2016-04-21 Thread Perrin Harkins
On Thu, Apr 21, 2016 at 5:20 AM, Iosif Fettich  wrote:

>
> I'm trying to achieve the following: when there is an incoming request, I
> want to set a time limit in which an answer should be delivered to the
> client, no matter what.
>
> However, since the work triggered by the initial request (there is another
> request to other site involved)  might take much longer than that time
> limit, I want that work to properly finish, despite the fact that the
> initial request was 'served' already.


TMTOWTDI, but the common way to do this is to add the long-running job to a
job queue, and then redirect the user to a page that periodically checks if
the job is done by using JavaScript requests.

If you don't have a job queue and don't want to add one just for this, you
could use a cleanup handler to run the slow stuff after disconnecting:
http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlCleanupHandler

That will tie up a mod_perl process though, so it's not a good way to go
for large sites.

- Perrin


close connection for request, but continue

2016-04-21 Thread Iosif Fettich

Dear mod_perl list,

please consider my gratefulness for any hints/insight :)

I'm trying to achieve the following: when there is an incoming request, I 
want to set a time limit in which an answer should be delivered to the 
client, no matter what.


However, since the work triggered by the initial request (there is another 
request to other site involved)  might take much longer than that time 
limit, I want that work to properly finish, despite the fact that the 
initial request was 'served' already.


I first thought that using alarm and closing the connection would just 
work, so my initial code was somewhat like



---

#!/usr/bin/perl

use strict;
use warnings;

use Apache2::RequestUtil;
use Apache2::Connection;
use Apache2::Const -compile => qw(:common :http );

local our $r = shift;

local our $t_start = time();
local $SIG{ALRM} = sub { _force_early_response( $t_start ) };

local our $it_took_too_long = 0;

our $alarm_time = 5; # seconds we allow to process an response

alarm $alarm_time;

# start working for the request
#
# work done

alarm 0
return Apache2::Const::OK;


sub _force_early_response {
my ($t1) = @_;

$it_took_too_long = 1;

my $t2 = time();

$r->assbackwards(1);
my $response_text = _sorry_but_that_will_be_ready_only_later();

$r->print( "HTTP/1.1 200 OK\n"
  ."Date: Wed, 20 Apr 2016 10:55:08 GMT\n"
  ."Server: Apache/2.2.31 (Amazon)\n"
  ."Content-Type: text/plain; charset=UTF-8\n");

$r->print( "\n$response_text" );

my $c = $r->connection();
my $socket = $c->client_socket;

$socket->close();
return;
}
---

That didn't work - the connection just was not closed.

A second attempt succeeded, where I changed the alarm trap to be

---
sub _force_early_response {
my ($t1) = @_;

$it_took_too_long = 1;

my $t2 = time();

$r->assbackwards(1);

my $response_text = _sorry_but_that_will_be_ready_only_later();
my $content_length = length( $response_text );

$r->print( "HTTP/1.1 200 OK\n"
  ."Date: Wed, 20 Apr 2016 10:55:08 GMT\n"
  ."Server: Apache/2.2.31 (Amazon)\n"
  ."Connection: close\n"
  ."Content-Type: text/plain; charset=UTF-8\n"
  ."Content-Length: $content_length\n" );

$r->print( "\n$response_text" );
return;
}
---

So while I have now something that seems to work OK and does exactly what 
I wanted, I'm a bit unhappy with the code.


I did look around, but I wasn't able to find any mod_perl/library function 
that would make the 'close connection and go on' code easier.


Especially for setting the status line, what I now do by

   $r->print( "HTTP/1.1 200 OK\n" )

I thought that there would be some ready-made tools available.

I actually hoped to get away without having to explicitely write out 
'hand-crafted' headers, but so far I wasn't able to find anything for 
that.


Maybe I just didn't look into the right places?

I'm running this on an updated Amazon Linux machine, vanilla httpd and 
mod_perl,


$ rpm -qi httpd
Name: httpd
Version : 2.2.31
Release : 1.7.amzn1

$ rpm -qi mod_perl
Name: mod_perl
Version : 2.0.7
Release : 7.27.amzn1

Many thanks in advance for any ideas, hints, comments.

Iosif Fettich