Hi,

What is the recommended way to share datasets across multiple
spark-streaming applications, so that the incoming data can be looked up
against this shared dataset?

The shared dataset is also incrementally refreshed and stored on S3. Below
is the scenario.

Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3.
Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3.


Streaming App-3 consumes data from Source-3, *needs to lookup against DS-1
and DS-2* and write to DS-3 in S3.
Streaming App-4 consumes data from Source-4, *needs to lookup against DS-1
and DS-2 *and write to DS-3 in S3.
Streaming App-n consumes data from Source-n, *needs to lookup against DS-1
and DS-2 *and write to DS-n in S3.

So DS-1 and DS-2 ideally should be shared for lookup across multiple
streaming apps. Any input is appreciated. Thank you!

Reply via email to