[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491869#comment-14491869 ]
Kannan Rajah commented on SPARK-1529: ------------------------------------- [~pwendell] The default code path still uses the FileChannel, memory mapping techniques. I just provided an abstraction called FileSystem.scala (not Hadoop's FileSystem.java). LocalFileSystem.scala delegates the call to existing Spark code path that uses FileChannel. I am using Hadoop's RawLocalFileSystem class just to get an InputStream, OutputStream. And this internally also uses FileChannel. Please see RawLocalFileSystem.LocalFSFileInputStream. It is just a wrapper on java.io.FileInputStream. Going back to why I considered this approach. It will allow us to reuse all the logic currently used by SortShuffle code path. We would have to implement pretty much everything that's been done by Spark to do the shuffle on HDFS. We are in the processing of running some performance tests to understand the impact of the change. One of the main things we will be verifying is if there is any performance degradation introduced in the default code path and fix if there is any. Is this acceptable? > Support setting spark.local.dirs to a hadoop FileSystem > -------------------------------------------------------- > > Key: SPARK-1529 > URL: https://issues.apache.org/jira/browse/SPARK-1529 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Patrick Wendell > Assignee: Kannan Rajah > Attachments: Spark Shuffle using HDFS.pdf > > > In some environments, like with MapR, local volumes are accessed through the > Hadoop filesystem interface. We should allow setting spark.local.dir to a > Hadoop filesystem location. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org