Re: Streaming multiple large data files through Restlet

Jon Blower Mon, 10 Mar 2008 08:59:06 -0700

Hi Jerome,

Thanks for this - I'm glad that you're interested in supporting this scenario.


>  In your case, will you be able to produce your dynamic data as an
>  InputStream or as a ReadableChannel? If so, a limited set of HTTP writer
>  threads could pull the data from processed Restlet requests and
>  asynchronously write it to the response socket.

Yes, I'm sure we could present our data in any form, either as a
stream or a channel.  We have pretty much complete control over the
system.  The approach of a limited set of HTTP writer threads sounds
good.  If this approach could be completely hidden from me (the
developer) that would be fabulous.  I guess I would just present my
resource (i.e. file) as a stream or channel Representation and Restlet
would do the rest (in collaboration with the underlying connector)?

By the way, we used to use Apache MINA in another project with some
success.  I believe this does asynchronous i/o of this kind.  I wonder
if that could be the basis of a suitable HTTP connector?  Just a
thought.

Cheers, Jon

On Mon, Mar 10, 2008 at 12:00 PM, Jerome Louvel <[EMAIL PROTECTED]> wrote:
>
>  Hi Jon,
>
>  This is a very interesting use case. We should aim at supporting it. Thanks
>  for detailing it so clearly.
>
>  The Restlet API has the potential to support this scenario, limiting the
>  number of threads to the minimum, but no HTTP server connector currently
>  supports this. Even the Grizzly connector currently blocks a thread when
>  writing the output.
>
>  In your case, will you be able to produce your dynamic data as an
>  InputStream or as a ReadableChannel? If so, a limited set of HTTP writer
>  threads could pull the data from processed Restlet requests and
>  asynchronously write it to the response socket.
>
>  Regarding HTTP ranges, they are not supported yet, but this would be a great
>  addition indeed. This is currently planned for 1.2:
>  http://restlet.tigris.org/issues/show_bug.cgi?id=115
>
>
>  Best regards,
>  Jerome
>
>  > -----Message d'origine-----
>  > De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De la
>  > part de Jon Blower
>  > Envoyé : vendredi 7 mars 2008 16:42
>
> > À : discuss@restlet.tigris.org
>  > Cc : Dan Bretherton
>
> > Objet : Re: Streaming multiple large data files through Restlet
>  >
>
>
> > Hi John (et al),
>  >
>  > Thanks very much to everyone for very helpful responses on this.
>  > Perhaps I should go into a bit more detail about our application.  We
>  > are writing an application for climate scientists that allows them to
>  > run climate simulation codes on remote compute clusters.  The codes
>  > produce large amounts of data (100s of gigabytes as a typical example)
>  > and we want the client to be able to download the output files from
>  > the cluster as the simulation progresses (so that the user can monitor
>  > what's going on and also reduce the disk footprint on the remote
>  > cluster).  The size of each file is of the order of gigabytes.
>  >
>  > A client will typically be downloading tens of output files
>  > simultaneously, maybe more.  We do not expect more than a handful of
>  > users to be connected to our server at any one time.  Nevertheless we
>  > don't want to spawn a new thread for each file that is downloaded (we
>  > could end up with hundreds of threads), which is essentially what we
>  > are forced to do in our current servlet-based implementation.  Another
>  > disadvantage of our current system is that if we exhaust the thread
>  > pool, new clients won't get any data at all until a thread is
>  > released.  I would rather have every client see a slow trickle than
>  > have a single client monopolise the server.
>  >
>  > There will be minimal re-use of files (if all goes well a given file
>  > will be downloaded exactly once) so caching won't help unfortunately.
>  > We do have control over the clients generally but part of the point of
>  > our design is that people can use their browser to download files if
>  > they wish so we can't assume that this is always true.
>  >
>  > We can't simply use a straight web server (e.g. Apache) for this
>  > because there is some other logic that goes along with the downloading
>  > of files.  For example, the files are generally append-only which
>  > means that we can start the process of downloading an output file
>  > before the file is completely written by the simulation code on the
>  > cluster.  The logic on the server side detects when a file is finished
>  > and hence we can control when the client sees EOF.  Apart from this
>  > there isn't much state associated with the downloading of each file.
>  >
>  > I'm thinking of implementing this by wrapping some simple code around
>  > an NIO FileChannel object that defers to this object for most
>  > operations but the wrapping code will control the detection of EOF.
>  >
>  > A related question: can I support HTTP range headers in Restlet?  If
>  > so then we can support resumable downloads and also HTTP download
>  > accelerators that open multiple streams and download different blocks
>  > of data (the latter would of course increase the number of clients).
>  >
>  > Thanks, Jon
>  >
>  > On Fri, Mar 7, 2008 at 2:07 PM, John D. Mitchell
>  > <[EMAIL PROTECTED]> wrote:
>  > > On Thu, Mar 6, 2008 at 7:14 AM, Jon Blower
>  > <[EMAIL PROTECTED]> wrote:
>  > >  [...]
>  > >
>  > > >  We have an existing RESTful web application that involves clients
>  > >  >  downloading multiple streams of data simultaneously.
>  > Our current
>  > >  >  implementation is based on servlets and we are experiencing
>  > >  >  scalability problems with the number of threads
>  > involved in serving
>  > >  >  multiple large data streams simultaneously.  I recently
>  > came across
>  > >  >  Restlet and was attracted by the potential to use NIO
>  > under the hood
>  > >  >  to enable more scalable large file transfers.
>  > >
>  > >  Cool.
>  > >
>  > >
>  > >  >  In our case we are not necessarily serving large files
>  > that already
>  > >  >  exist on disk: we are essentially creating the files
>  > ourselves on the
>  > >  >  fly (so they are of unknown length when the file
>  > transfer starts).  I
>  > >  >  was wondering if anyone could offer advice on how to support the
>  > >  >  serving of such data streams through Restlet in a
>  > scalable manner
>  > >  >  (ideally without creating a new thread on the server
>  > for each file
>  > >  >  transfer)?
>  > >
>  > >  What do you mean by "large files"?  I.e., are talking
>  > about generating
>  > >  content that is merely large relative to a web page (i.e.,
>  > measured in
>  > >  megabytes) or are you talking about something like complete hi-def
>  > >  video (GBs in size) or something both large and nominally
>  > endless like
>  > >  live video streams?
>  > >
>  > >  For the first case, if they are small enough I'd start by
>  > just fully
>  > >  rendering the contents to a Representation as usual and profile how
>  > >  well you can use the existing Jetty connector (with
>  > tuning, etc.).  As
>  > >  you add more simultaneous clients, add more servers.
>  > Also, run your
>  > >  experiments with the new Grizzly connector and track that as it and
>  > >  v1.1+ stabilizes.
>  > >
>  > >  For the second case (or where you have content sizes in
>  > the first case
>  > >  but lots of slow clients), I'd actually have that part of my origin
>  > >  servers either be fronted by a reverse-caching-proxy
>  > (e.g., squid) or
>  > >  generate and dump the contents from the origin server into a local
>  > >  file and redirect the client to get that content from
>  > e.g., lighttpd
>  > >  (+mod_secdownload).  Depending on the nature of your client
>  > >  applications, the potential reuse of the generated
>  > content, etc. you
>  > >  can tune how you clean up the caches.
>  > >
>  > >  For the last case, if I controlled the clients then I'd
>  > probably have
>  > >  the clients request good-sized chunks of the data in a loop and
>  > >  devolve to the appropriate combination of the first two
>  > approaches. Of
>  > >  course, that's more or less presuming that you can generate those
>  > >  chunks more or less independently (i.e., with minimal state
>  > >  information needed to keep the continuity from chunk to chunk).  If
>  > >  you have heavy amounts of state and/or if you don't control the
>  > >  clients then I'd want to know a good bit more before making any
>  > >  recommendation.
>  > >
>  > >  Hope this helps,
>  > >  John
>  > >
>  >
>  >
>  >
>  > --
>  > --------------------------------------------------------------
>  > Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
>  > Technical Director         Tel: +44 118 378 8741 (ESSC)
>  > Reading e-Science Centre   Fax: +44 118 378 6413
>  > ESSC                       Email: [EMAIL PROTECTED]
>  > University of Reading
>  > 3 Earley Gate
>  > Reading RG6 6AL, UK
>  > --------------------------------------------------------------
>
>



-- 
--------------------------------------------------------------
Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
Technical Director         Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre   Fax: +44 118 378 6413
ESSC                       Email: [EMAIL PROTECTED]
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------

Re: Streaming multiple large data files through Restlet

Reply via email to