I agree with this -- basically, to build on Reynold's point, you should be able 
to get almost the same performance by implementing either the Hadoop FileSystem 
API or the Spark Data Source API over Ignite in the right way. This would let 
people save data persistently in Ignite in addition to using it for caching, 
and it would provide a global namespace, optionally a schema, etc. You can 
still provide data locality, short-circuit reads, etc with these APIs.

Matei

> On Jul 20, 2015, at 9:40 PM, Reynold Xin <r...@databricks.com> wrote:
> 
> I sent it prematurely.
> 
> They are already pluggable, or at least in the process to be more pluggable. 
> In 1.4, instead of calling the external system's API directly, we added an 
> API for that.  There is a patch to add support for HDFS in-memory cache. 
> 
> Somewhat orthogonal to this, longer term, I am not sure whether it makes 
> sense to have the current off heap API, because there is no namespacing and 
> the benefit to end users is actually not very substantial (at least I can 
> think of much simpler ways to achieve exactly the same gains), and yet it 
> introduces quite a bit of complexity to the codebase.
> 
> 
> 
> 
> On Mon, Jul 20, 2015 at 9:34 PM, Reynold Xin <r...@databricks.com 
> <mailto:r...@databricks.com>> wrote:
> They are already pluggable.
> 
> 
> On Mon, Jul 20, 2015 at 9:32 PM, Prashant Sharma <scrapco...@gmail.com 
> <mailto:scrapco...@gmail.com>> wrote:
> +1 Looks like a nice idea(I do not see any harm). Would you like to work on 
> the patch to support it ?
> 
> Prashant Sharma
> 
> 
> 
> On Tue, Jul 21, 2015 at 2:46 AM, Alexey Goncharuk <alexey.goncha...@gmail.com 
> <mailto:alexey.goncha...@gmail.com>> wrote:
> Hello Spark community,
> 
> I was looking through the code in order to understand better how RDD is 
> persisted to Tachyon off-heap filesystem. It looks like that the Tachyon 
> filesystem is hard-coded and there is no way to switch to another in-memory 
> filesystem. I think it would be great if the implementation of the 
> BlockManager and BlockStore would be able to plug in another filesystem.
> 
> For example, Apache Ignite also has an implementation of in-memory filesystem 
> which can store data in on-heap and off-heap formats. It would be great if it 
> could integrate with Spark.
> 
> I have filed a ticket in Jira: 
> https://issues.apache.org/jira/browse/SPARK-9203 
> <https://issues.apache.org/jira/browse/SPARK-9203>
> 
> If it makes sense, I will be happy to contribute to it.
> 
> Thoughts?
> 
> -Alexey (Apache Ignite PMC)
> 
> 
> 

Reply via email to