[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491869#comment-14491869
 ] 

Kannan Rajah commented on SPARK-1529:
-------------------------------------

[~pwendell] The default code path still uses the FileChannel, memory mapping 
techniques. I just provided an abstraction called FileSystem.scala (not 
Hadoop's FileSystem.java). LocalFileSystem.scala delegates the call to existing 
Spark code path that uses FileChannel. I am using Hadoop's RawLocalFileSystem 
class just to get an InputStream, OutputStream. And this internally also uses 
FileChannel. Please see RawLocalFileSystem.LocalFSFileInputStream. It is just a 
wrapper on java.io.FileInputStream.

Going back to why I considered this approach. It will allow us to reuse all the 
logic currently used by SortShuffle code path. We would have to implement 
pretty much everything that's been done by Spark to do the shuffle on HDFS. We 
are in the processing of running some performance tests to understand the 
impact of the change. One of the main things we will be verifying is if there 
is any performance degradation introduced in the default code path and fix if 
there is any. Is this acceptable?

> Support setting spark.local.dirs to a hadoop FileSystem 
> --------------------------------------------------------
>
>                 Key: SPARK-1529
>                 URL: https://issues.apache.org/jira/browse/SPARK-1529
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Kannan Rajah
>         Attachments: Spark Shuffle using HDFS.pdf
>
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to