Hi, Alluxio enables sharing dataframes across different applications. This blog post <https://www.alluxio.com/blog/effective-spark-dataframes-with-alluxio> talks about dataframes and Alluxio, and this Spark Summit presentation <https://spark-summit.org/2017/events/best-practices-for-using-alluxio-with-apache-spark/> has additional information.
Thanks, Gene On Tue, Oct 31, 2017 at 6:04 PM, Revin Chalil <rcha...@expedia.com> wrote: > Any info on the below will be really appreciated. > > > > I read about Alluxio and Ignite. Has anybody used any of them? Do they > work well with multiple Apps doing lookups simultaneously? Are there better > options? Thank you. > > > > *From: *roshan joe <impdocs2...@gmail.com> > *Date: *Monday, October 30, 2017 at 7:53 PM > *To: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *share datasets across multiple spark-streaming applications > for lookup > > > > Hi, > > > > What is the recommended way to share datasets across multiple > spark-streaming applications, so that the incoming data can be looked up > against this shared dataset? > > > > The shared dataset is also incrementally refreshed and stored on S3. Below > is the scenario. > > > > Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3. > > Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3. > > > > > Streaming App-3 consumes data from Source-3, *needs to lookup against > DS-1 and DS-2* and write to DS-3 in S3. > > Streaming App-4 consumes data from Source-4, *needs to lookup against > DS-1 and DS-2 *and write to DS-3 in S3. > > Streaming App-n consumes data from Source-n, *needs to lookup against > DS-1 and DS-2 *and write to DS-n in S3. > > > > So DS-1 and DS-2 ideally should be shared for lookup across multiple > streaming apps. Any input is appreciated. Thank you! >