[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-12-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736788#comment-15736788
 ] 

Hudson commented on HDFS-11056:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10979 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10979/])
HDFS-11229. HDFS-11056 failed to close meta file. Contributed by (weichiu: rev 
2a28e8cf0469a373a99011f0fa540474e60528c8)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java


> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657402#comment-15657402
 ] 

Kihwal Lee commented on HDFS-11056:
---

+1 for the 2.7 patch. It looks to be a correct port. Thanks [~jojochuang].

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-10 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656363#comment-15656363
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


The warnings and test errors looks unrelated.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654855#comment-15654855
 ] 

Hadoop QA commented on HDFS-11056:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
44s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 130 unchanged - 2 fixed = 131 total (was 132) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2630 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
53s{color} | {color:red} The patch 139 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
20s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_111 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestDNFencing |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.web.TestHttpsFileSystem |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
| JDK v1.7.0_111 Failed junit tests | hadoop.hdfs.TestDatanodeRegistration |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.web.TestHttpsFileSystem |
|   | 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-10 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654224#comment-15654224
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


I committed the patch to trunk, branch-2 and branch-2.8, and I am still working 
on a branch-2.7 patch.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.branch-2.7.patch, HDFS-11056.branch-2.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652614#comment-15652614
 ] 

Hadoop QA commented on HDFS-11056:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
27s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 3s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 130 unchanged - 2 fixed = 132 total (was 132) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2270 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
58s{color} | {color:red} The patch 139 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 34s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
20s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_111 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.web.TestHttpsFileSystem |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
| JDK v1.7.0_111 Failed junit tests | 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651541#comment-15651541
 ] 

Hudson commented on HDFS-11056:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10802 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10802/])
HDFS-11056. Concurrent append and read operations lead to checksum (weichiu: 
rev c619e9b43fd00ba0e59a98ae09685ff719bb722b)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java


> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.branch-2.patch, HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-08 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649778#comment-15649778
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


The branch-2 failed tests are not related.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.branch-2.patch, HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649325#comment-15649325
 ] 

Hadoop QA commented on HDFS-11056:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m  3s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_101 Failed junit tests | 
hadoop.hdfs.server.datanode.TestFsDatasetCache |
| JDK v1.7.0_111 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Issue | HDFS-11056 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12838052/HDFS-11056.branch-2.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2b8b5a8d049c 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 
20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / b77239b |
| Default 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-07 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644747#comment-15644747
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


If no one objects -- I will commit the latest patch by end of Tuesday, and I 
will file a follow up jira to study if it's necessary to optimize checksum 
calculation by adding the last chunk checksum into finalized/temporary replica 
class.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-04 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637758#comment-15637758
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


Hi [~kihwal] thanks for the review!

This fix re-computes last chunk checksum when converting finalized/temporary 
replica to rbw replica. Would you think it may be more efficient if we store 
the last chunk checksum in finalized/temporary replica object, so that it may 
be more efficient if there are frequent open->append->close operations?

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-03 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634810#comment-15634810
 ] 

Lei (Eddy) Xu commented on HDFS-11056:
--

Hi, [~jojochuang]. [HDFS-10636] is not related change. 

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-03 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633249#comment-15633249
 ] 

Kihwal Lee commented on HDFS-11056:
---

I was looking at the patch since yesterday. It looks like the partial chunk sum 
is loaded from disk and saved in memory before it is modified.  That seems like 
a correct approach. +1

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-03 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633054#comment-15633054
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


The test failure is unrelated.
[~eddyxu] [~virajith] would you like to make a comment? I saw that HDFS-10636 
refactored a lot of relevant code, but I do think the same bug existed pre 
HDFS-10636.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.002.patch, 
> HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630215#comment-15630215
 ] 

Hadoop QA commented on HDFS-11056:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.qjournal.client.TestQuorumJournalManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-11056 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12836626/HDFS-11056.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9c23a583ac15 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 0dc2a6a |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17389/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17389/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17389/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-02 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628713#comment-15628713
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


TestWriteToReplica#testAppend failed because it tries to read from the meta 
file of a non-existent block replica.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.001.patch, HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627199#comment-15627199
 ] 

Hadoop QA commented on HDFS-11056:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 52s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-11056 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12836460/HDFS-11056.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 938dfd53d8ab 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / aacf214 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17371/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17371/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17371/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-01 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626517#comment-15626517
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


I have a proof of concept fix, but getting a unit test that reliably reproduce 
the error seems tricky given there are many moving parts.

The major hurdle is to create a replica which is in Rbw state whose visible 
length != on-disk length, and let a client read the replica concurrently.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-01 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625993#comment-15625993
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


FsVolumeImpl#convertTemporaryToRbw does not generate last checksum, so it seems 
to suffer from the same bug. This is likely what caused the checksum error in 
HDFS-6804.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-01 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625972#comment-15625972
 ] 

Yongjun Zhang commented on HDFS-11056:
--

That's very nice findings [~jojochuang]! Congrats!

Can we create a unit test to reproduce the scenario? Thanks.



> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-11-01 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625942#comment-15625942
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


I believe I have found the root cause of bug:

The bug is, when BlockSender sends a RBW block and it reads the last block and 
checksum, it is supposed to in-memory checksum, which is (supposedly) the 
correct checksum corresponding to the un-appended data.

However, the in-memory checksum of the ReplicaInPipeline object is *null* and 
BlockSender therefore skips reading in-memory checksum and use on-disk checksum 
instead, which results in checksum error, because on-disk checksum corresponds 
to on-disk data (but may not be the visible data)

The checksum is null, because when a replica is being converted to RBW from 
Finalized for append, it does not call setLastChecksumAndDataLen(). (See: 
FsVolumeImpl#append)

This bug is subtle, and is only exposed when reading a replica whose on-disk 
data length is longer than visible length.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-10-28 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616571#comment-15616571
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


Still working on this. It looks like when data is being appended, checksum is 
written to on disk metafile.
When another client reads data, it reads checksum from on-disk metadata (which 
is the most up to date, corresponds to the data already written to disk, but 
not visible to client yet), instead of in-memory checksum (which is the 
snapshot). But it should have read from in-memory checksum.

So the inconsistency between data and checksum causes incorrect checksum to be 
read.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-10-28 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616049#comment-15616049
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


The checksum error seems to occur when a client reads a RBW replica, when it is 
being appended but unfinalized.

Maybe the replica was read while checksum metafile was not updated? That would 
be my guess.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> 

[jira] [Commented] (HDFS-11056) Concurrent append and read operations lead to checksum error

2016-10-25 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606738#comment-15606738
 ] 

Wei-Chiu Chuang commented on HDFS-11056:


This bug seems to be the root cause of HDFS-11022 in the first place.

> Concurrent append and read operations lead to checksum error
> 
>
> Key: HDFS-11056
> URL: https://issues.apache.org/jira/browse/HDFS-11056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, httpfs
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11056.reproduce.patch
>
>
> If there are two clients, one of them open-append-close a file continuously, 
> while the other open-read-close the same file continuously, the reader 
> eventually gets a checksum error in the data read.
> On my local Mac, it takes a few minutes to produce the error. This happens to 
> httpfs clients, but there's no reason not believe this happens to any append 
> clients.
> I have a unit test that demonstrates the checksum error. Will attach later.
> Relevant log:
> {quote}
> 2016-10-25 15:34:45,153 INFO  audit - allowed=trueugi=weichiu 
> (auth:SIMPLE)   ip=/127.0.0.1   cmd=opensrc=/tmp/bar.txt
> dst=nullperm=null   proto=rpc
> 2016-10-25 15:34:45,155 INFO  DataNode - Receiving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: 
> /127.0.0.1:51130 dest: /127.0.0.1:50131
> 2016-10-25 15:34:45,155 INFO  FsDatasetImpl - Appending to FinalizedReplica, 
> blk_1073741825_1182, FINALIZED
>   getNumBytes() = 182
>   getBytesOnDisk()  = 182
>   getVisibleLength()= 182
>   getVolume()   = 
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
>   getBlockURI() = 
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-25 15:34:45,167 INFO  DataNode - opReadBlock 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception 
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
> 2016-10-25 15:34:45,167 WARN  DataNode - 
> DatanodeRegistration(127.0.0.1:50131, 
> datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, 
> infoSecurePort=0, ipcPort=50134, 
> storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got 
> exception while serving 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, 
> newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
> 2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error 
> processing READ_BLOCK operation  src: /127.0.0.1:51121 dst: /127.0.0.1:50131
> java.io.IOException: No data exists for block 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:400)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-10-25 15:34:45,168 INFO  FSNamesystem - 
> updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
> 2016-10-25 15:34:45,170 WARN  DFSClient - Found Checksum error for 
> BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from 
> DatanodeInfoWithStorage[127.0.0.1:50131,DS-a1878418-4f7f-4fc9-b3f7-d7ed780b5373,DISK]
>