Andrew Rewoonenco created HDFS-7151:
---------------------------------------

             Summary: DFSInputStream method seek works incorrectly on huge HDFS 
block size
                 Key: HDFS-7151
                 URL: https://issues.apache.org/jira/browse/HDFS-7151
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, fuse-dfs, hdfs-client
    Affects Versions: 2.5.1, 2.4.1, 2.5.0, 2.4.0, 2.3.0
         Environment: dfs.block.size > 2Gb
            Reporter: Andrew Rewoonenco
            Priority: Critical


Hadoop incorrectly works with block size more than 2Gb.

The seek method of DFSInputStream class used int (32 bit signed) internal value 
for seeking inside current block. This cause seek error when block size is 
greater 2Gb.

Found when using very large parquet files (10Gb) in Impala on Cloudera cluster 
with block size 10Gb.

Here is some log output:
W0924 08:27:15.920017 40026 DFSInputStream.java:1397] BlockReader failed to 
seek to 4390830898. Instead, it seeked to 95863602.
W0924 08:27:15.921295 40024 DFSInputStream.java:1397] BlockReader failed to 
seek to 5597521814. Instead, it seeked to 1302554518.

BlockReader seek only 32-bit offsets (4390830898-95863602=4Gb as 
5597521814-1302554518).

The code fragment producing that bug:
int diff = (int)(targetPos - pos);
      if (diff <= blockReader.available()) {

Similar errors can exist in other parts of the HDFS.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to