Re: Re: Spark RDD cache persistence

2015-12-09 Thread Calvin Jia
Hi Deepak, For persistence across Spark jobs, you can store and access the RDDs in Tachyon. Tachyon works with ramdisk which would give you similar in-memory performance you would have within a Spark job. For more information, you can take a look at the docs on Tachyon-Spark integration:

Re: Re: Spark RDD cache persistence

2015-11-05 Thread r7raul1...@163.com
You can try http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Archival_Storage_SSD__Memory . Hive tmp table use this function to speed job. https://issues.apache.org/jira/browse/HIVE-7313 r7raul1...@163.com From: Christian Date: 2015-11-06 13:50

Re: Re: Spark RDD cache persistence

2015-11-05 Thread Deenar Toraskar
You can have a long running Spark context in several fashions. This will ensure your data will be cached in memory. Clients will access the RDD through a REST API that you can expose. See the Spark Job Server, it does something similar. It has something called Named RDDs Using Named RDDs Named