[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491863#comment-14491863 ]
Patrick Wendell commented on SPARK-1529: ---------------------------------------- Hey Kannan, We originally considered doing something like you are proposing, where we would change our filesystem interactions to all use a Hadoop FileSystem class and then we'd use Hadoop's LocalFileSystem. However, there were two issues: 1. We used POSIX API's that are not present in Hadoop. For instance, we use memory mapping in various places, FileChannel in the BlockObjectWriter, etc. 2. Using LocalFileSystem has a substantial performance overheads compared with our current code. So we'd have to write our own implementation of a Local filesystem. For this reason, we decided that our current shuffle machinery was fundamentally not usable for non-POSIX environments. So we decided that instead, we'd let people customize shuffle behavior at a higher level and we implemented the pluggable shuffle components. So you can create a shuffle manager that is specifically optimized for a particular environment (e.g. MapR). Did you consider implementing a MapR shuffle using that mechanism instead? You'd have to operate at a higher level, where you reason about shuffle records, etc. But you'd have a lot of flexibility to optimize within that. > Support setting spark.local.dirs to a hadoop FileSystem > -------------------------------------------------------- > > Key: SPARK-1529 > URL: https://issues.apache.org/jira/browse/SPARK-1529 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Patrick Wendell > Assignee: Kannan Rajah > Attachments: Spark Shuffle using HDFS.pdf > > > In some environments, like with MapR, local volumes are accessed through the > Hadoop filesystem interface. We should allow setting spark.local.dir to a > Hadoop filesystem location. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org