Stephen Adkins writes:
> Yes, I like Apache/mod_perl for everything that can be satisfied
> within a short period of time (i.e. synchronous calls).  
> However, when I initiate tasks from a browser which take a long time, 
> I think the work needs to be offloaded to another server... one that 
> can stick around and do the work.

We have long-lived transactions.  One time it took 2 hours for one of
these transactions to complete (due to a performance bug, but it still
worked).  We "detach" the job from the browser a PerlCleanupHandler.
Check out:

http://petshop.bivio.biz/src?s=Bivio::Agent::Job::Dispatcher

and how it is called in:

http://petshop.bivio.biz/src?s=Bivio::Agent::HTTP::Dispatcher

Note that the job is a part of the transaction.  It for some reason
the transaction doesn't complete, the job will not be executed.
(Well, we don't have a to phase commit in our txn_resource code,
because currently the txn_resources wouldn't fail in a "prep" stage.
This is clearly a weak point, which I believe could be easily
remedied should the need arise.)

> requests need to be queued.  This is not a good use for a web server.
> Thus the need for another server.

You have to consider resource management when allowing long jobs.  You
might have a separate machine or not.  That is a question for the
design of your peak load configuration.  Either you have the resources
to handle the requests or not.  This is independent of whether the
long-running jobs are under control of a different server software
package from Apache.

By introducing a new type of server on the same machine, the available
resources are split between Apache and other.  It's like having two
different types of measurements: metric and US.  You need twice the
storage space for the parts and two sets of tools.  There is no
technical reason why you need both U.S. and metric.  (Let's not get
into electrical and phone connectors.)

When building a distributed system, you need to consider whether or
not you need to distribute (granularity), how to find what is
distributed (naming), how you're going to distribute (protocols),
where you're going to distribute (processes), and if you have enough
resources to distribute (hardware: CPU, memory, bandwidth).  The only
question which affects the choice of RPC mechanism is protocols.  Any
protocol which has been used for a while allows for the other
variables (granularity, naming, processes, and hardware).  Apache
solves these problems as well as any other RPC mechanism out there.
You can't go "too" fine grained with Apache, but I don't think this is
the problem were talking about here.

If you look at J2EE, you'll see a cacophony of protocols.  This
indicates immaturity imiho.  When you haven't answered the above
questions, you end up solving the problem outside the protocol either
in the application or in other protocols.

Rob


Reply via email to