[ https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Beyer reopened HDFS-196: ------------------------------ HDFS file length reported by ls may be less than the number of bytes found when reading. I created the mismatched file by kill -9 during a copy so that the client doesn't shutdown its connection to the namenode properly. This misreported length persisted after restarting hdfs. {quote} $ hdfs dfs -copyFromLocal junk17 /tmp/. 2015-06-09 13:09:25,742 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ^Z [1]+ Stopped hdfs dfs -copyFromLocal junk17 /tmp/. $ kill -9 %1 [1]+ Stopped hdfs dfs -copyFromLocal junk17 /tmp/. $ fg -bash: fg: job has terminated [1]+ Killed: 9 hdfs dfs -copyFromLocal junk17 /tmp/. $ hdfs dfs -ls /tmp 2015-06-09 13:09:45,730 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items drwxrwx--- - jane supergroup 0 2015-05-28 14:26 /tmp/hadoop-yarn drwx-wx-wx - jane supergroup 0 2015-05-28 14:26 /tmp/hive -rw-r--r-- 1 jane supergroup 1073741824 2015-06-09 13:09 /tmp/junk17._COPYING_ $ hdfs dfs -ls /tmp 2015-06-09 13:09:55,345 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items drwxrwx--- - jane supergroup 0 2015-05-28 14:26 /tmp/hadoop-yarn drwx-wx-wx - jane supergroup 0 2015-05-28 14:26 /tmp/hive -rw-r--r-- 1 jane supergroup 1073741824 2015-06-09 13:09 /tmp/junk17._COPYING_ $ hdfs dfs -cat /tmp/junk17._COPYING_ | wc -c 1207959752 $ hdfs dfs -ls /tmp 2015-06-09 13:11:21,389 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items drwxrwx--- - jane supergroup 0 2015-05-28 14:26 /tmp/hadoop-yarn drwx-wx-wx - jane supergroup 0 2015-05-28 14:26 /tmp/hive -rw-r--r-- 1 jane supergroup 1073741824 2015-06-09 13:09 /tmp/junk17._COPYING_ $ hdfs dfs -cp /tmp/junk17._COPYING_ /tmp/junk18 2015-06-09 13:13:38,963 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable $ hdfs dfs -ls /tmp 2015-06-09 13:13:45,575 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 4 items drwxrwx--- - jane supergroup 0 2015-05-28 14:26 /tmp/hadoop-yarn drwx-wx-wx - jane supergroup 0 2015-05-28 14:26 /tmp/hive -rw-r--r-- 1 jane supergroup 1073741824 2015-06-09 13:09 /tmp/junk17._COPYING_ -rw-r--r-- 1 jane supergroup 1207959552 2015-06-09 13:13 /tmp/junk18 {quote} {quote} $ hdfs version Hadoop 2.6.0 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 Compiled by jenkins on 2014-11-13T21:10Z Compiled with protoc 2.5.0 >From source with checksum 18e43357c8f927c0695f1e9522859d6a {quote} > File length not reported correctly after application crash > ---------------------------------------------------------- > > Key: HDFS-196 > URL: https://issues.apache.org/jira/browse/HDFS-196 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Doug Judd > > Our application (Hypertable) creates a transaction log in HDFS. This log is > written with the following pattern: > out_stream.write(header, 0, 7); > out_stream.sync() > out_stream.write(data, 0, amount); > out_stream.sync() > [...] > However, if the application crashes and then comes back up again, the > following statement > length = mFilesystem.getFileStatus(new Path(fileName)).getLen(); > returns the wrong length. Apparently this is because this method fetches > length information from the NameNode which is stale. Ideally, a call to > getFileStatus() would return the accurate file length by fetching the size of > the last block from the primary datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)