Hi Deepak,
For persistence across Spark jobs, you can store and access the RDDs in
Tachyon. Tachyon works with ramdisk which would give you similar in-memory
performance you would have within a Spark job.
For more information, you can take a look at the docs on Tachyon-Spark
integration:
You can try
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Archival_Storage_SSD__Memory
. Hive tmp table use this function to speed job.
https://issues.apache.org/jira/browse/HIVE-7313
r7raul1...@163.com
From: Christian
Date: 2015-11-06 13:50
You can have a long running Spark context in several fashions. This will
ensure your data will be cached in memory. Clients will access the RDD
through a REST API that you can expose. See the Spark Job Server, it does
something similar. It has something called Named RDDs
Using Named RDDs
Named