Re: Best practice for handing tables between pipeline components

2016-06-29 Thread Chanh Le
Hi Everett, We are using Alluxio for the last 2 months. We implement Alluxio for sharing data each Spark Job, isolated Spark only for process layer and Alluxio for the storage layer. > On Jun 29, 2016, at 2:52 AM, Everett Anderson > wrote: > > Thanks! Alluxio

Re: Best practice for handing tables between pipeline components

2016-06-28 Thread Everett Anderson
Thanks! Alluxio looks quite promising, but also quite new. What did people do before? On Mon, Jun 27, 2016 at 12:33 PM, Gene Pang wrote: > Yes, Alluxio (http://www.alluxio.org/) can be used to store data > in-memory between stages in a pipeline. > > Here is more

Re: Best practice for handing tables between pipeline components

2016-06-27 Thread Gene Pang
Yes, Alluxio (http://www.alluxio.org/) can be used to store data in-memory between stages in a pipeline. Here is more information about running Spark with Alluxio: http://www.alluxio.org/documentation/v1.1.0/en/Running-Spark-on-Alluxio.html Hope that helps, Gene On Mon, Jun 27, 2016 at 10:38

Re: Best practice for handing tables between pipeline components

2016-06-27 Thread Sathish Kumaran Vairavelu
Alluxio off heap memory would help to share cached objects On Mon, Jun 27, 2016 at 11:14 AM Everett Anderson wrote: > Hi, > > We have a pipeline of components strung together via Airflow running on > AWS. Some of them are implemented in Spark, but some aren't. Generally

Best practice for handing tables between pipeline components

2016-06-27 Thread Everett Anderson
Hi, We have a pipeline of components strung together via Airflow running on AWS. Some of them are implemented in Spark, but some aren't. Generally they can all talk to a JDBC/ODBC end point or read/write files from S3. Ideally, we wouldn't suffer the I/O cost of writing all the data to HDFS or