[
https://issues.apache.org/jira/browse/CRUNCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389140#comment-14389140
]
Ioannis Kerkinos commented on CRUNCH-505:
-----------------------------------------
Hi Micah,
It does provide an implementation of Hadoop FileSystem, you can find it here
[1].
Also I think it is quite straightforward to use. Changing the schema to
"tachyon" is enough as you can see in the example bellow.
Would it be ok if I were to start working on it? If so, do you maybe have some
tips on where to start? I've been working a bit with Tachyon for my master's
thesis and I think this would be a useful performance improvement for Crunch.
==EXAMPLE==
Spark/MapReduce without Tachyon
• Spark
– val file = sc.textFile(“hdfs://ip:port/path”)
• Hadoop MapReduce
– hadoop jar hadoop-‐examples-‐1.0.4.jar wordcount
hdfs://localhost:19998/input hdfs://localhost: 19998/output
Spark/MapReduce with Tachyon
• Spark
– val file = sc.textFile(“tachyon://ip:port/path”)
• Hadoop MapReduce
– hadoop jar hadoop-‐examples-‐1.0.4.jar wordcount
tachyon://localhost:19998/input tachyon:// localhost:19998/output
[1]-https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/hadoop/AbstractTFS.java
> Store intermediate data in memory only using Tachyon
> ----------------------------------------------------
>
> Key: CRUNCH-505
> URL: https://issues.apache.org/jira/browse/CRUNCH-505
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.12.0
> Reporter: Ioannis Kerkinos
> Assignee: Josh Wills
>
> Tachyon is a memory-centric distributed storage system that enables reliable
> data sharing at memory-speed. If used as the storage for intermediate data
> (between MR jobs) it should improve performance as you won't have to go to
> HDFS. In order to do so, the MUST_CACHE write type of Tachyon can be used.
> This will enable data to be persisted in memory only without going to HDFS.
> So the intermediate data will be read/written at memory-speed and only the
> final result will be written in HDFS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)