Hi Everett,
We are using Alluxio for the last 2 months. We implement Alluxio for sharing
data each Spark Job, isolated Spark only for process layer and Alluxio for the
storage layer.
> On Jun 29, 2016, at 2:52 AM, Everett Anderson
> wrote:
>
> Thanks! Alluxio
Thanks! Alluxio looks quite promising, but also quite new.
What did people do before?
On Mon, Jun 27, 2016 at 12:33 PM, Gene Pang wrote:
> Yes, Alluxio (http://www.alluxio.org/) can be used to store data
> in-memory between stages in a pipeline.
>
> Here is more
Yes, Alluxio (http://www.alluxio.org/) can be used to store data in-memory
between stages in a pipeline.
Here is more information about running Spark with Alluxio:
http://www.alluxio.org/documentation/v1.1.0/en/Running-Spark-on-Alluxio.html
Hope that helps,
Gene
On Mon, Jun 27, 2016 at 10:38
Alluxio off heap memory would help to share cached objects
On Mon, Jun 27, 2016 at 11:14 AM Everett Anderson
wrote:
> Hi,
>
> We have a pipeline of components strung together via Airflow running on
> AWS. Some of them are implemented in Spark, but some aren't. Generally
Hi,
We have a pipeline of components strung together via Airflow running on
AWS. Some of them are implemented in Spark, but some aren't. Generally they
can all talk to a JDBC/ODBC end point or read/write files from S3.
Ideally, we wouldn't suffer the I/O cost of writing all the data to HDFS or