[ https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119735#comment-17119735 ]
TezQA commented on TEZ-2442: ---------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} TEZ-2442 does not apply to master. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | TEZ-2442 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12803121/tez-2442-trunk.5.patch | | Console output | https://builds.apache.org/job/PreCommit-TEZ-Build/456/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org | This message was automatically generated. > Support DFS based shuffle in addition to HTTP shuffle > ----------------------------------------------------- > > Key: TEZ-2442 > URL: https://issues.apache.org/jira/browse/TEZ-2442 > Project: Apache Tez > Issue Type: Improvement > Affects Versions: 0.5.3 > Reporter: Kannan Rajah > Assignee: shanyu zhao > Priority: Major > Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, > hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.3.patch, > tez-2442-trunk.4.patch, tez-2442-trunk.5.patch, tez-2442-trunk.patch, > tez_hdfs_shuffle.patch > > > In Tez, Shuffle is a mechanism by which intermediate data can be shared > between stages. Shuffle data is written to local disk and fetched from any > remote node using HTTP. A DFS like MapR file system can support writing this > shuffle data directly to its DFS using a notion of local volumes and retrieve > it using HDFS API from remote node. The current Shuffle implementation > assumes local data can only be managed by LocalFileSystem. So it uses > RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption > and introduce an abstraction to manage local disks, then we can reuse most of > the shuffle logic (store, sort) and inject a HDFS API based retrieval instead > of HTTP. -- This message was sent by Atlassian Jira (v8.3.4#803005)