[ https://issues.apache.org/jira/browse/HDFS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900421#comment-16900421 ]
Jorge Machado commented on HDFS-916: ------------------------------------ Hi Guys, I know this is pretty old but is there any status on this ? We are transferring like 30TB via hdfs dfs copyFromLocal to a hadoop Cluster, Currently we have the cpus as bottleneck... > Rewrite DFSOutputStream to use a single thread with NIO > ------------------------------------------------------- > > Key: HDFS-916 > URL: https://issues.apache.org/jira/browse/HDFS-916 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Priority: Major > > The DFS write pipeline code has some really hairy multi-threaded > synchronization. There have been a lot of bugs produced by this (HDFS-101, > HDFS-793, HDFS-915, tens of others) since it's very hard to understand the > message passing, lock sharing, and interruption properties. The reason for > the multiple threads is to be able to simultaneously send and receive. If > instead of using multiple threads, it used nonblocking IO, I think the whole > thing would be a lot less error prone. > I think we could do this in two halves: one half is the DFSOutputStream. The > other half is BlockReceiver. I opened this JIRA first as I think it's simpler > (only one TCP connection to deal with, rather than an up and downstream) > Opinions? Am I crazy? I would like to see some agreement on the idea before I > spend time writing code. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org