+1 for re-publishing to pubsub if there is only transient value in the data. If 
you need to query the intermediate representation then you will need to use a 
database.

Sharing RDDs in memory should be possible with projects like spark job server 
but I think that’s overkill in this scenario.

Lastly, if there is no strong requirement to have different jobs, you might 
consider collapsing the 2 jobs into one.. And simply have multiple stages that 
execute in the same job.

-adrian

From: Ewan Leith
Date: Monday, October 19, 2015 at 12:34 PM
To: Oded Maimon, user
Subject: RE: Spark Streaming - use the data in different jobs

Storing the data in HBase, Cassandra, or similar is possibly the right answer, 
the other option that can work well is re-publishing the data back into second 
queue on RabbitMQ, to be read again by the next job.

Thanks,
Ewan

From: Oded Maimon [mailto:o...@scene53.com]
Sent: 18 October 2015 12:49
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Spark Streaming - use the data in different jobs

Hi,
we've build a spark streaming process that get data from a pub/sub (rabbitmq in 
our case).

now we want the streamed data to be used in different spark jobs (also in 
realtime if possible)

what options do we have for doing that ?


  *   can the streaming process and different spark jobs share/access the same 
RDD's?
  *   can the streaming process create a sparkSQL table and other jobs read/use 
it?
  *   can a spark streaming process trigger other spark jobs and send the the 
data (in memory)?
  *   can a spark streaming process cache the data in memory and other 
scheduled jobs access same rdd's?
  *   should we keep the data to hbase and read it from other jobs?
  *   other ways?

I believe that the answer will be using external db/storage..  hoping to have a 
different solution :)

Thanks.


Regards,
Oded Maimon
Scene53.


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
Please note that any disclosure, copying or distribution of the content of this 
information is strictly forbidden. If you have received this email message in 
error, please destroy it immediately and notify its sender.

Reply via email to