Any info on the below will be really appreciated.

I read about Alluxio and Ignite. Has anybody used any of them? Do they work 
well with multiple Apps doing lookups simultaneously? Are there better options? 
Thank you.

From: roshan joe <impdocs2...@gmail.com>
Date: Monday, October 30, 2017 at 7:53 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: share datasets across multiple spark-streaming applications for lookup

Hi,

What is the recommended way to share datasets across multiple spark-streaming 
applications, so that the incoming data can be looked up against this shared 
dataset?

The shared dataset is also incrementally refreshed and stored on S3. Below is 
the scenario.

Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3.
Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3.


Streaming App-3 consumes data from Source-3, needs to lookup against DS-1 and 
DS-2 and write to DS-3 in S3.
Streaming App-4 consumes data from Source-4, needs to lookup against DS-1 and 
DS-2 and write to DS-3 in S3.
Streaming App-n consumes data from Source-n, needs to lookup against DS-1 and 
DS-2 and write to DS-n in S3.

So DS-1 and DS-2 ideally should be shared for lookup across multiple streaming 
apps. Any input is appreciated. Thank you!

Reply via email to