[ https://issues.apache.org/jira/browse/HDFS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
CW Chung updated HDFS-1907: --------------------------- Description: BlockMissingException is thrown under this test scenario: - two different processes doing concurrent file r/w: one read and the other write on the same file - writer keep doing file write - reader doing position file read from beginning of the file to the visible end of file, repeatedly The call is basically: byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound); where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000; Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program. Error msg: ========================= 11/05/08 19:05:48 WARN hdfs.DFSClient: Failed to connect to /xxx.xx.xx.xxx:1004 for file /tmp/N/909NF for block BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653:java.io.IOException: Got error for OP_READ_BLOCK, self=/xxx.xx.xx.xxx:36405, remote=/xxx.xx.xx.xxx:1004, for file /tmp/N/909NF, for pool BP-1632719171-xxx.xx.xx.xxx-1303748685682 block -8940328094159486414_3653 at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:398) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:631) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51) at TAppend.readUntilVisibleEnd(TAppend.java:441) at TAppend.readUntilEnd(TAppend.java:474) at TAppend.testReadOnly(TAppend.java:956) at TAppend.main(TAppend.java:1215) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) .... ##### Caught Exception in testReadOnly while reading. java.io.IOException: Exception caught in readUntilVisibleEnd: Reader currentOffset = 0 ; totalByteRead =0 ; latest byteRead = 0 ; visibleLen= 67824640 ; byteLeftToRead = 67824640 ; bufferLen = 10240000 ; chunkNumber= 0 ; input pos = 0 ; byteToReadThisRound = 10240000 ; Filename = /tmp/N/909NF, ReadParam - CurrentPostion=0, offset=0, size=10240000 at TAppend.readUntilVisibleEnd(TAppend.java:457) at TAppend.readUntilEnd(TAppend.java:474) at TAppend.testReadOnly(TAppend.java:956) at TAppend.main(TAppend.java:1215) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653 file=/tmp/N/909NF at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:570) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:618) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51) at TAppend.readUntilVisibleEnd(TAppend.java:441) ... 8 more was: BlockMissingException is thrown under this test scenario: - two different processes doing concurrent file r/w: one read and the other write on the same file - writer keep doing file write - reader doing position file read from beginning of the file to the visible end of file, repeatedly The call is basically: byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound); where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000; Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program. Error msg: ========================= 11/05/08 19:05:48 WARN hdfs.DFSClient: Failed to connect to /98.137.98.116:1004 for file /tmp/N/909NF for block BP-1632719171-98.137.98.114-1303748685682:blk_-8940328094159486414_3653:java.io.IOException: Got error for OP_READ_BLOCK, self=/98.137.98.125:36405, remote=/98.137.98.116:1004, for file /tmp/N/909NF, for pool BP-1632719171-98.137.98.114-1303748685682 block -8940328094159486414_3653 at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:398) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:631) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51) at TAppend.readUntilVisibleEnd(TAppend.java:441) at TAppend.readUntilEnd(TAppend.java:474) at TAppend.testReadOnly(TAppend.java:956) at TAppend.main(TAppend.java:1215) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) .... ##### Caught Exception in testReadOnly while reading. java.io.IOException: Exception caught in readUntilVisibleEnd: Reader currentOffset = 0 ; totalByteRead =0 ; latest byteRead = 0 ; visibleLen= 67824640 ; byteLeftToRead = 67824640 ; bufferLen = 10240000 ; chunkNumber= 0 ; input pos = 0 ; byteToReadThisRound = 10240000 ; Filename = /tmp/N/909NF, ReadParam - CurrentPostion=0, offset=0, size=10240000 at TAppend.readUntilVisibleEnd(TAppend.java:457) at TAppend.readUntilEnd(TAppend.java:474) at TAppend.testReadOnly(TAppend.java:956) at TAppend.main(TAppend.java:1215) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1632719171-98.137.98.114-1303748685682:blk_-8940328094159486414_3653 file=/tmp/N/909NF at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:570) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:618) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51) at TAppend.readUntilVisibleEnd(TAppend.java:441) ... 8 more Environment: Run on a real cluster. Using the latest 0.23 build. (was: Run on a real cluster $ hadoop version Hadoop 0.22.0.1105090202 Subversion git://hadoopre5.corp.sk1.yahoo.com/home/y/var/builds/thread2/workspace/Cloud-HadoopCommon-0.22-Secondary -r 3c23e43f9e262e7843e4287436429fad3224b0f7 Compiled by hadoopqa on Mon May 9 02:13:09 PDT 2011 >From source with checksum 90b5fc469fd7a1fa0ba22db893423fed ) > BlockMissingException upon concurrent read and write: reader was doing file > position read while writer is doing write without hflush > ------------------------------------------------------------------------------------------------------------------------------------ > > Key: HDFS-1907 > URL: https://issues.apache.org/jira/browse/HDFS-1907 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.23.0 > Environment: Run on a real cluster. Using the latest 0.23 build. > Reporter: CW Chung > > BlockMissingException is thrown under this test scenario: > - two different processes doing concurrent file r/w: one read and the other > write on the same file > - writer keep doing file write > - reader doing position file read from beginning of the file to the visible > end of file, repeatedly > The call is basically: > byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound); > where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = > 1024*10000; > Usually it does not fail right away. I have to read, close file, re-open the > same file a few times to create the problem. I'll pose a test program to > repro this problem after I've cleaned up a bit my current test program. > Error msg: > ========================= > 11/05/08 19:05:48 WARN hdfs.DFSClient: Failed to connect to > /xxx.xx.xx.xxx:1004 for file /tmp/N/909NF for block > BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653:java.io.IOException: > Got error for > OP_READ_BLOCK, self=/xxx.xx.xx.xxx:36405, remote=/xxx.xx.xx.xxx:1004, for > file /tmp/N/909NF, for pool > BP-1632719171-xxx.xx.xx.xxx-1303748685682 block -8940328094159486414_3653 > at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:398) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:631) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) > at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51) > at TAppend.readUntilVisibleEnd(TAppend.java:441) > at TAppend.readUntilEnd(TAppend.java:474) > at TAppend.testReadOnly(TAppend.java:956) > at TAppend.main(TAppend.java:1215) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:192) > .... > ##### Caught Exception in testReadOnly while reading. > java.io.IOException: Exception caught in readUntilVisibleEnd: Reader > currentOffset = 0 ; totalByteRead =0 ; latest byteRead = 0 ; visibleLen= > 67824640 ; byteLeftToRead = 67824640 ; bufferLen = 10240000 ; chunkNumber= 0 > ; input pos = 0 ; byteToReadThisRound = 10240000 ; Filename = /tmp/N/909NF, > ReadParam - CurrentPostion=0, offset=0, size=10240000 > at TAppend.readUntilVisibleEnd(TAppend.java:457) > at TAppend.readUntilEnd(TAppend.java:474) > at TAppend.testReadOnly(TAppend.java:956) > at TAppend.main(TAppend.java:1215) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:192) > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: > BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653 > file=/tmp/N/909NF > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:570) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:618) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) > at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51) > at TAppend.readUntilVisibleEnd(TAppend.java:441) > ... 8 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira