[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

Patrick Wendell (JIRA) Sun, 12 Apr 2015 19:56:24 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491863#comment-14491863
 ]


Patrick Wendell commented on SPARK-1529:
----------------------------------------

Hey Kannan,

We originally considered doing something like you are proposing, where we would 
change our filesystem interactions to all use a Hadoop FileSystem class and 
then we'd use Hadoop's LocalFileSystem. However, there were two issues:

1. We used POSIX API's that are not present in Hadoop. For instance, we use 
memory mapping in various places, FileChannel in the BlockObjectWriter, etc.
2. Using LocalFileSystem has a substantial performance overheads compared with 
our current code. So we'd have to write our own implementation of a Local 
filesystem.

For this reason, we decided that our current shuffle machinery was 
fundamentally not usable for non-POSIX environments. So we decided that 
instead, we'd let people customize shuffle behavior at a higher level and we 
implemented the pluggable shuffle components. So you can create a shuffle 
manager that is specifically optimized for a particular environment (e.g. MapR).

Did you consider implementing a MapR shuffle using that mechanism instead? 
You'd have to operate at a higher level, where you reason about shuffle 
records, etc. But you'd have a lot of flexibility to optimize within that.

> Support setting spark.local.dirs to a hadoop FileSystem 
> --------------------------------------------------------
>
>                 Key: SPARK-1529
>                 URL: https://issues.apache.org/jira/browse/SPARK-1529
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Kannan Rajah
>         Attachments: Spark Shuffle using HDFS.pdf
>
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

Reply via email to