Hi,

On Tue, Sep 9, 2014 at 12:59 PM, Ron's Yahoo! <zlgonza...@yahoo.com> wrote:
>
>  I want to create a synchronous REST API that will process some data that
> is passed in as some request.
>  I would imagine that the Spark Streaming Job on YARN is a long
> running job that waits on requests from something. What that something is
> is still not clear to me, but I would imagine that it’s some queue.
> The goal is to be able to push a message onto a queue with some id, and
> then  get the processed results back from Spark Streaming.
>

That is not exactly a Spark Streaming use case, I think. Spark Streaming
pulls data from some source (like a queue), then processes all data
collected in a certain interval in a mini-batch, and stores that data
somewhere. It is not well suited for handling request-response cycles in a
synchronous way; you might consider using plain Spark (without Streaming)
for that.

For example, you could use the unfiltered
http://unfiltered.databinder.net/Unfiltered.html library and within request
handling do some RDD operation, returning the output as HTTP response. This
works fine as multiple threads can submit Spark jobs concurrently
https://spark.apache.org/docs/latest/job-scheduling.html You could also
check https://github.com/adobe-research/spindle -- that seems to be similar
to what you are doing.

 The goal is for the REST API be able to respond to lots of calls with low
> latency.
>  Hope that clarifies things...
>

Note that "low latency" for "lots of calls" is maybe not something that
Spark was built for. Even if you do close to nothing data processing, you
may not get below 200ms or so due to the overhead of submitting jobs etc.,
from my experience.

Tobias

Reply via email to