[jira] [Updated] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle

2015-05-20 Thread Kannan Rajah (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Rajah updated SPARK-1529:

Summary: Support DFS based shuffle in addition to Netty shuffle  (was: 
Support setting spark.local.dirs to a hadoop FileSystem )

 Support DFS based shuffle in addition to Netty shuffle
 --

 Key: SPARK-1529
 URL: https://issues.apache.org/jira/browse/SPARK-1529
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Kannan Rajah
 Attachments: Spark Shuffle using HDFS.pdf


 In some environments, like with MapR, local volumes are accessed through the 
 Hadoop filesystem interface. We should allow setting spark.local.dir to a 
 Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle

2015-05-20 Thread Kannan Rajah (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Rajah updated SPARK-1529:

Description: In some environments, like with MapR, local volumes are 
accessed through the Hadoop filesystem interface. Shuffle is implemented by 
writing intermediate data to local disk and serving it to remote node using 
Netty as a transport mechanism. We want to provide an HDFS based shuffle such 
that data can be written to HDFS (instead of local disk) and served using HDFS 
API on the remote nodes. This could involve exposing a file system abstraction 
to Spark shuffle and have 2 modes of running it. In default mode, it will write 
to local disk and in the DFS mode, it will write to HDFS.  (was: In some 
environments, like with MapR, local volumes are accessed through the Hadoop 
filesystem interface. We should allow setting spark.local.dir to a Hadoop 
filesystem location. )

 Support DFS based shuffle in addition to Netty shuffle
 --

 Key: SPARK-1529
 URL: https://issues.apache.org/jira/browse/SPARK-1529
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Kannan Rajah
 Attachments: Spark Shuffle using HDFS.pdf


 In some environments, like with MapR, local volumes are accessed through the 
 Hadoop filesystem interface. Shuffle is implemented by writing intermediate 
 data to local disk and serving it to remote node using Netty as a transport 
 mechanism. We want to provide an HDFS based shuffle such that data can be 
 written to HDFS (instead of local disk) and served using HDFS API on the 
 remote nodes. This could involve exposing a file system abstraction to Spark 
 shuffle and have 2 modes of running it. In default mode, it will write to 
 local disk and in the DFS mode, it will write to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org