Laurent Goujon created HDFS-5798:
------------------------------------
Summary: DFSClient uses non-valid data when computing file checksum
Key: HDFS-5798
URL: https://issues.apache.org/jira/browse/HDFS-5798
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client
Affects Versions: 2.0.5-alpha, 1.1.2
Reporter: Laurent Goujon
In DFSClient.java, when computing the checksum, all md5 checksums are fetched
for each block and added to a DataOutputStream instance (md5out), and later
final checksum is computed this way:
{code:title=DFSClient.java}
final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
{code}
The problem is that getData() return you a buffer valid until
md5out.getLength(), and fileMD5 is the MD5 of the MD5 of each block PLUS a
bunch of random values (here, buffer is not reused so it should be 0) which
depends on the Java implementation of the ByteArrayOutputStream.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)