[
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kannan Rajah updated SPARK-1529:
Description: In some environments, like with MapR, local volumes are
accessed through the Hadoop filesystem interface. Shuffle is implemented by
writing intermediate data to local disk and serving it to remote node using
Netty as a transport mechanism. We want to provide an HDFS based shuffle such
that data can be written to HDFS (instead of local disk) and served using HDFS
API on the remote nodes. This could involve exposing a file system abstraction
to Spark shuffle and have 2 modes of running it. In default mode, it will write
to local disk and in the DFS mode, it will write to HDFS. (was: In some
environments, like with MapR, local volumes are accessed through the Hadoop
filesystem interface. We should allow setting spark.local.dir to a Hadoop
filesystem location. )
Support DFS based shuffle in addition to Netty shuffle
--
Key: SPARK-1529
URL: https://issues.apache.org/jira/browse/SPARK-1529
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Patrick Wendell
Assignee: Kannan Rajah
Attachments: Spark Shuffle using HDFS.pdf
In some environments, like with MapR, local volumes are accessed through the
Hadoop filesystem interface. Shuffle is implemented by writing intermediate
data to local disk and serving it to remote node using Netty as a transport
mechanism. We want to provide an HDFS based shuffle such that data can be
written to HDFS (instead of local disk) and served using HDFS API on the
remote nodes. This could involve exposing a file system abstraction to Spark
shuffle and have 2 modes of running it. In default mode, it will write to
local disk and in the DFS mode, it will write to HDFS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org