Andrew Rewoonenco created HDFS-7151: ---------------------------------------
Summary: DFSInputStream method seek works incorrectly on huge HDFS block size Key: HDFS-7151 URL: https://issues.apache.org/jira/browse/HDFS-7151 Project: Hadoop HDFS Issue Type: Bug Components: datanode, fuse-dfs, hdfs-client Affects Versions: 2.5.1, 2.4.1, 2.5.0, 2.4.0, 2.3.0 Environment: dfs.block.size > 2Gb Reporter: Andrew Rewoonenco Priority: Critical Hadoop incorrectly works with block size more than 2Gb. The seek method of DFSInputStream class used int (32 bit signed) internal value for seeking inside current block. This cause seek error when block size is greater 2Gb. Found when using very large parquet files (10Gb) in Impala on Cloudera cluster with block size 10Gb. Here is some log output: W0924 08:27:15.920017 40026 DFSInputStream.java:1397] BlockReader failed to seek to 4390830898. Instead, it seeked to 95863602. W0924 08:27:15.921295 40024 DFSInputStream.java:1397] BlockReader failed to seek to 5597521814. Instead, it seeked to 1302554518. BlockReader seek only 32-bit offsets (4390830898-95863602=4Gb as 5597521814-1302554518). The code fragment producing that bug: int diff = (int)(targetPos - pos); if (diff <= blockReader.available()) { Similar errors can exist in other parts of the HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)