[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434098#comment-13434098 ]
Daryn Sharp commented on HDFS-3788: ----------------------------------- bq. Nicholas: How about first check the transfer-encoding, if it is chunked, then no content-length check? Exactly. However, you need to update the patch to check both "Transfer-Encoding" and "TE" headers, and the headers may contain multiple comma separated values. I haven't tested, but I would expect Java's input stream for chunked responses to throw an EOF exception if the connection is broken so you might want to add a test for that. bq. Eli: Note that a get of a 3gb file works but not distcp, what path is different? The code paths should be identical since it's the creation of the input stream that does the content-length check. I can't see how distcp could possibly work unless distcp is not using the filesystem class... > distcp can't copy large files using webhdfs due to missing Content-Length > header > -------------------------------------------------------------------------------- > > Key: HDFS-3788 > URL: https://issues.apache.org/jira/browse/HDFS-3788 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs > Affects Versions: 0.23.3, 2.0.0-alpha > Reporter: Eli Collins > Assignee: Tsz Wo (Nicholas), SZE > Priority: Critical > Attachments: distcp-webhdfs-errors.txt, h3788_20120813.patch > > > The following command fails when data1 contains a 3gb file. It passes when > using hftp or when the directory just contains smaller (<2gb) files, so looks > like a webhdfs issue with large files. > {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 > hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira