[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492215#comment-14492215
 ] 

Sean Owen commented on SPARK-1529:
----------------------------------

(Sorry if this double-posts.)

Is there a good way to see the whole diff at the moment? I know there's a 
branch with individual commits. Maybe I am missing something basic.

This puts a new abstraction on top of a Hadoop FileSystem on top of the 
underlying file system abstraction. That's getting heavy. If it's only 
abstracting access to an InputStream / OutputStream, why is it needed? that's 
already directly available from, say, Hadoop's FileSystem.

What would be the performance gain if this is the bit being swapped out? This 
is my original question -- you shuffle to HDFS, then read it back to send it 
again via the existing shuffle? It kind of made sense when the idea was to swap 
the whole shuffle to replace its transport.

> Support setting spark.local.dirs to a hadoop FileSystem 
> --------------------------------------------------------
>
>                 Key: SPARK-1529
>                 URL: https://issues.apache.org/jira/browse/SPARK-1529
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Kannan Rajah
>         Attachments: Spark Shuffle using HDFS.pdf
>
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to