[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736788#comment-15736788 ] Hudson commented on HDFS-11056: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10979 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10979/]) HDFS-11229. HDFS-11056 failed to close meta file. Contributed by (weichiu: rev 2a28e8cf0469a373a99011f0fa540474e60528c8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2 > > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657402#comment-15657402 ] Kihwal Lee commented on HDFS-11056: --- +1 for the 2.7 patch. It looks to be a correct port. Thanks [~jojochuang]. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656363#comment-15656363 ] Wei-Chiu Chuang commented on HDFS-11056: The warnings and test errors looks unrelated. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654855#comment-15654855 ] Hadoop QA commented on HDFS-11056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 44s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 50s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 130 unchanged - 2 fixed = 131 total (was 132) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2630 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 53s{color} | {color:red} The patch 139 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 35s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestDNFencing | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.web.TestHttpsFileSystem | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | JDK v1.7.0_111 Failed junit tests | hadoop.hdfs.TestDatanodeRegistration | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.web.TestHttpsFileSystem | | |
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654224#comment-15654224 ] Wei-Chiu Chuang commented on HDFS-11056: I committed the patch to trunk, branch-2 and branch-2.8, and I am still working on a branch-2.7 patch. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652614#comment-15652614 ] Hadoop QA commented on HDFS-11056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 27s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 130 unchanged - 2 fixed = 132 total (was 132) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2270 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 58s{color} | {color:red} The patch 139 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 34s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.web.TestHttpsFileSystem | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.datanode.TestBlockReplacement | | JDK v1.7.0_111 Failed junit tests |
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651541#comment-15651541 ] Hudson commented on HDFS-11056: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10802 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10802/]) HDFS-11056. Concurrent append and read operations lead to checksum (weichiu: rev c619e9b43fd00ba0e59a98ae09685ff719bb722b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.branch-2.patch, HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649778#comment-15649778 ] Wei-Chiu Chuang commented on HDFS-11056: The branch-2 failed tests are not related. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.branch-2.patch, HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649325#comment-15649325 ] Hadoop QA commented on HDFS-11056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 20s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 3s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}168m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_101 Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache | | JDK v1.7.0_111 Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HDFS-11056 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838052/HDFS-11056.branch-2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2b8b5a8d049c 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / b77239b | | Default
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644747#comment-15644747 ] Wei-Chiu Chuang commented on HDFS-11056: If no one objects -- I will commit the latest patch by end of Tuesday, and I will file a follow up jira to study if it's necessary to optimize checksum calculation by adding the last chunk checksum into finalized/temporary replica class. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637758#comment-15637758 ] Wei-Chiu Chuang commented on HDFS-11056: Hi [~kihwal] thanks for the review! This fix re-computes last chunk checksum when converting finalized/temporary replica to rbw replica. Would you think it may be more efficient if we store the last chunk checksum in finalized/temporary replica object, so that it may be more efficient if there are frequent open->append->close operations? > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634810#comment-15634810 ] Lei (Eddy) Xu commented on HDFS-11056: -- Hi, [~jojochuang]. [HDFS-10636] is not related change. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633249#comment-15633249 ] Kihwal Lee commented on HDFS-11056: --- I was looking at the patch since yesterday. It looks like the partial chunk sum is loaded from disk and saved in memory before it is modified. That seems like a correct approach. +1 > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633054#comment-15633054 ] Wei-Chiu Chuang commented on HDFS-11056: The test failure is unrelated. [~eddyxu] [~virajith] would you like to make a comment? I saw that HDFS-10636 refactored a lot of relevant code, but I do think the same bug existed pre HDFS-10636. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, > HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630215#comment-15630215 ] Hadoop QA commented on HDFS-11056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 30s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.client.TestQuorumJournalManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-11056 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12836626/HDFS-11056.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 9c23a583ac15 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0dc2a6a | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17389/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17389/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17389/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628713#comment-15628713 ] Wei-Chiu Chuang commented on HDFS-11056: TestWriteToReplica#testAppend failed because it tries to read from the meta file of a non-existent block replica. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.001.patch, HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627199#comment-15627199 ] Hadoop QA commented on HDFS-11056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-11056 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12836460/HDFS-11056.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 938dfd53d8ab 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / aacf214 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17371/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17371/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17371/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL:
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626517#comment-15626517 ] Wei-Chiu Chuang commented on HDFS-11056: I have a proof of concept fix, but getting a unit test that reliably reproduce the error seems tricky given there are many moving parts. The major hurdle is to create a replica which is in Rbw state whose visible length != on-disk length, and let a client read the replica concurrently. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625993#comment-15625993 ] Wei-Chiu Chuang commented on HDFS-11056: FsVolumeImpl#convertTemporaryToRbw does not generate last checksum, so it seems to suffer from the same bug. This is likely what caused the checksum error in HDFS-6804. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625972#comment-15625972 ] Yongjun Zhang commented on HDFS-11056: -- That's very nice findings [~jojochuang]! Congrats! Can we create a unit test to reproduce the scenario? Thanks. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625942#comment-15625942 ] Wei-Chiu Chuang commented on HDFS-11056: I believe I have found the root cause of bug: The bug is, when BlockSender sends a RBW block and it reads the last block and checksum, it is supposed to in-memory checksum, which is (supposedly) the correct checksum corresponding to the un-appended data. However, the in-memory checksum of the ReplicaInPipeline object is *null* and BlockSender therefore skips reading in-memory checksum and use on-disk checksum instead, which results in checksum error, because on-disk checksum corresponds to on-disk data (but may not be the visible data) The checksum is null, because when a replica is being converted to RBW from Finalized for append, it does not call setLastChecksumAndDataLen(). (See: FsVolumeImpl#append) This bug is subtle, and is only exposed when reading a replica whose on-disk data length is longer than visible length. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616571#comment-15616571 ] Wei-Chiu Chuang commented on HDFS-11056: Still working on this. It looks like when data is being appended, checksum is written to on disk metafile. When another client reads data, it reads checksum from on-disk metadata (which is the most up to date, corresponds to the data already written to disk, but not visible to client yet), instead of in-memory checksum (which is the snapshot). But it should have read from in-memory checksum. So the inconsistency between data and checksum causes incorrect checksum to be read. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616049#comment-15616049 ] Wei-Chiu Chuang commented on HDFS-11056: The checksum error seems to occur when a client reads a RBW replica, when it is being appended but unfinalized. Maybe the replica was read while checksum metafile was not updated? That would be my guess. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for >
[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error
[ https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606738#comment-15606738 ] Wei-Chiu Chuang commented on HDFS-11056: This bug seems to be the root cause of HDFS-11022 in the first place. > Concurrent append and read operations lead to checksum error > > > Key: HDFS-11056 > URL: https://issues.apache.org/jira/browse/HDFS-11056 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, httpfs >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11056.reproduce.patch > > > If there are two clients, one of them open-append-close a file continuously, > while the other open-read-close the same file continuously, the reader > eventually gets a checksum error in the data read. > On my local Mac, it takes a few minutes to produce the error. This happens to > httpfs clients, but there's no reason not believe this happens to any append > clients. > I have a unit test that demonstrates the checksum error. Will attach later. > Relevant log: > {quote} > 2016-10-25 15:34:45,153 INFO audit - allowed=trueugi=weichiu > (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/tmp/bar.txt > dst=nullperm=null proto=rpc > 2016-10-25 15:34:45,155 INFO DataNode - Receiving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: > /127.0.0.1:51130 dest: /127.0.0.1:50131 > 2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, > blk_1073741825_1182, FINALIZED > getNumBytes() = 182 > getBytesOnDisk() = 182 > getVisibleLength()= 182 > getVolume() = > /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1 > getBlockURI() = > file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825 > 2016-10-25 15:34:45,167 INFO DataNode - opReadBlock > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > 2016-10-25 15:34:45,167 WARN DataNode - > DatanodeRegistration(127.0.0.1:50131, > datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, > infoSecurePort=0, ipcPort=50134, > storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got > exception while serving > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, > newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197) > 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error > processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131 > java.io.IOException: No data exists for block > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289) > at java.lang.Thread.run(Thread.java:745) > 2016-10-25 15:34:45,168 INFO FSNamesystem - > updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success > 2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for > BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from > DatanodeInfoWithStorage[127.0.0.1:50131,DS-a1878418-4f7f-4fc9-b3f7-d7ed780b5373,DISK] >