[jira] [Commented] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378898#comment-15378898 ] Akira Ajisaka commented on HDFS-10628: -- +1, thanks Jiayi. > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10628.001.patch, HDFS-10628.002.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378879#comment-15378879 ] Vinayakumar B commented on HDFS-10587: -- bq. Here it says checksum error at 81920, which is at the very beginning itself. May be 229 disk have some problem, or during transfer to 77 some corruption due to network card would have happened. Is not exactly same as current case. I was wrong. [~xupeng] case is also exactly same as this Jira. Here is how, # 77 is throwing exception while verifying the received packet during transfer from 229(which got the block transfered earlier from 228) # While verifying only packet, the position mentioned in the checksum exception, is relative to packet buffer offset, not the block offset. So 81920 is the offset in the exception. # Blocks already written to disk in 77 during transfer before checksum exception : 9830400 # Total : 9830400 + 81920 == 9912320, which is same as bytes received by 229 from 228 when it was added to pipeline. > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver has the replica as follows: > 2016-04-15 22:03:05,068 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41186816 > getBytesOnDisk() = 41186816 > getVisibleLength()= 41186816 > getVolume() = /hadoop-g/data/current > getBlockFile()= > /hadoop-g/data/current/BP-1043567091-10.1.1.1-1
[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378808#comment-15378808 ] Hadoop QA commented on HDFS-10626: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m 52s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818086/HDFS-10626.002.patch | | JIRA Issue | HDFS-10626 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 78a59e0c1b08 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e549a9a | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16066/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16066/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch, HD
[jira] [Commented] (HDFS-10632) DataXceiver to report the length of the block it's receiving
[ https://issues.apache.org/jira/browse/HDFS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378805#comment-15378805 ] Hadoop QA commented on HDFS-10632: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818080/HDFS-10632.001.patch | | JIRA Issue | HDFS-10632 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux fc321d41bab2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e549a9a | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16065/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16065/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16065/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DataXceiver to report the length of the block it's receiving > > > Key: HDFS-10632 > URL: https://issues.apache.org/jira/browse/HDFS-10632 > Project: Hadoop HDFS > Iss
[jira] [Commented] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378758#comment-15378758 ] Yiqun Lin commented on HDFS-10600: -- Thanks a lot for the commit, [~eddyxu]! > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10600.001.patch, HDFS-10600.002.patch > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10626: - Attachment: HDFS-10626.002.patch > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378755#comment-15378755 ] Yiqun Lin commented on HDFS-10626: -- Thanks [~shahrs87] and [~yzhangal] for the review. Post the new patch for addressing the comment. The patch change the code form {code} LOG.warn("Cannot report bad " + block.getBlockId(), e); {code} to {code} LOG.warn("Cannot report bad " + block, ie); {code} It will print out the detail info of the bad block. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10632) DataXceiver to report the length of the block it's receiving
[ https://issues.apache.org/jira/browse/HDFS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10632: - Status: Patch Available (was: Open) Attach a simple patch to fix this. > DataXceiver to report the length of the block it's receiving > > > Key: HDFS-10632 > URL: https://issues.apache.org/jira/browse/HDFS-10632 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Yiqun Lin >Priority: Trivial > Labels: supportability > Attachments: HDFS-10632.001.patch > > > In DataXceiver:#writeBlock: > {code} > LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: " > + localAddress); > {code} > This message is better to report the size of the block its receiving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10632) DataXceiver to report the length of the block it's receiving
[ https://issues.apache.org/jira/browse/HDFS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10632: - Attachment: HDFS-10632.001.patch > DataXceiver to report the length of the block it's receiving > > > Key: HDFS-10632 > URL: https://issues.apache.org/jira/browse/HDFS-10632 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Yiqun Lin >Priority: Trivial > Labels: supportability > Attachments: HDFS-10632.001.patch > > > In DataXceiver:#writeBlock: > {code} > LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: " > + localAddress); > {code} > This message is better to report the size of the block its receiving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10632) DataXceiver to report the length of the block it's receiving
[ https://issues.apache.org/jira/browse/HDFS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin reassigned HDFS-10632: Assignee: Yiqun Lin > DataXceiver to report the length of the block it's receiving > > > Key: HDFS-10632 > URL: https://issues.apache.org/jira/browse/HDFS-10632 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Yiqun Lin >Priority: Trivial > Labels: supportability > > In DataXceiver:#writeBlock: > {code} > LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: " > + localAddress); > {code} > This message is better to report the size of the block its receiving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10614) Appended blocks can be closed even before IBRs from DataNodes
[ https://issues.apache.org/jira/browse/HDFS-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378688#comment-15378688 ] Jing Zhao commented on HDFS-10614: -- Thanks for the fix and review, [~vinayrpet] and [~ajisakaa]. The patch also looks good to me. One question is, before we remove the block from the storageInfo, whether we can also add an extra check to make sure the reported block's GS is greater than the stored block. In this way the logic will be the same with {{setGenerationStampAndVerifyReplicas}} in {{updatePipeline}}. > Appended blocks can be closed even before IBRs from DataNodes > - > > Key: HDFS-10614 > URL: https://issues.apache.org/jira/browse/HDFS-10614 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-10614.01.patch, HDFS-10614.02.patch > > > Scenario: >1. Open the file for append() >2. Trigger append pipeline setup by adding some data. >3. Consider RECEIVING IBRs of DNs reaches NN first. >4. updatePipeline() rpc sent to namenode to update the pipeline. >5. Now, if complete() is called on the file even before closing the > pipeline, then block will be COMPLETE, even before block is actually > FINALIZED at DN side and file will be closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378642#comment-15378642 ] Hadoop QA commented on HDFS-10628: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 12s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818039/HDFS-10628.002.patch | | JIRA Issue | HDFS-10628 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2358a4540623 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e549a9a | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16064/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16064/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16064/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 >
[jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service
[ https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378555#comment-15378555 ] Ming Ma commented on HDFS-8940: --- [~drankye], it might come from the following assumption. Only admins can use the existing inotify functionality. With this feature, any user should be able to use it. > Support for large-scale multi-tenant inotify service > > > Key: HDFS-8940 > URL: https://issues.apache.org/jira/browse/HDFS-8940 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf > > > HDFS-6634 provides the core inotify functionality. We would like to extend > that to provide a large-scale service that ten of thousands of clients can > subscribe to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378546#comment-15378546 ] Wei-Chiu Chuang commented on HDFS-10587: I see. So for the logs I saw, there's no "Appending to " messages. I think the replica was created without being in FINALIZED state. > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver has the replica as follows: > 2016-04-15 22:03:05,068 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41186816 > getBytesOnDisk() = 41186816 > getVisibleLength()= 41186816 > getVolume() = /hadoop-g/data/current > getBlockFile()= > /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186816 > bytesOnDisk=41186816 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10632) DataXceiver to report the length of the block it's receiving
[ https://issues.apache.org/jira/browse/HDFS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-10632: - Labels: supportability (was: ) > DataXceiver to report the length of the block it's receiving > > > Key: HDFS-10632 > URL: https://issues.apache.org/jira/browse/HDFS-10632 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Priority: Trivial > Labels: supportability > > In DataXceiver:#writeBlock: > {code} > LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: " > + localAddress); > {code} > This message is better to report the size of the block its receiving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10632) DataXceiver to report the length of the block it's receiving
[ https://issues.apache.org/jira/browse/HDFS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-10632: - Priority: Trivial (was: Major) Component/s: hdfs datanode > DataXceiver to report the length of the block it's receiving > > > Key: HDFS-10632 > URL: https://issues.apache.org/jira/browse/HDFS-10632 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Priority: Trivial > > In DataXceiver:#writeBlock: > {code} > LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: " > + localAddress); > {code} > This message is better to report the size of the block its receiving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service
[ https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378541#comment-15378541 ] Kai Zheng commented on HDFS-8940: - Thanks [~mingma] for the confirm. I have another question, after reading your doc. In what sense does this relate to {{multi-tenant}}? If sounds good, I suggest we change the tittle some bit to clarify. In my understanding currently HDFS doesn't support multi-tenant by itself, so inherently this design might be not neither for simple. The {{large-scale}} sounds good to have already. > Support for large-scale multi-tenant inotify service > > > Key: HDFS-8940 > URL: https://issues.apache.org/jira/browse/HDFS-8940 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf > > > HDFS-6634 provides the core inotify functionality. We would like to extend > that to provide a large-scale service that ten of thousands of clients can > subscribe to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10632) DataXceiver to report the length of the block it's receiving
Yongjun Zhang created HDFS-10632: Summary: DataXceiver to report the length of the block it's receiving Key: HDFS-10632 URL: https://issues.apache.org/jira/browse/HDFS-10632 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yongjun Zhang In DataXceiver:#writeBlock: {code} LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: " + localAddress); {code} This message is better to report the size of the block its receiving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhou updated HDFS-10628: -- Attachment: HDFS-10628.002.patch Checkstyle fixed. Failure tests are not related as it's just a simple patch. > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10628.001.patch, HDFS-10628.002.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378500#comment-15378500 ] Mingliang Liu edited comment on HDFS-10326 at 7/14/16 10:18 PM: The v1 patch rebases from {{trunk}} and resolves conflicts with [HADOOP-13351] in unit test. was (Author: liuml07): The v1 patch rebases from {{trunk}} and resolve conflicts with [HADOOP-13351] in unit test. > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10326.000.patch, HDFS-10326.001.patch > > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378500#comment-15378500 ] Mingliang Liu commented on HDFS-10326: -- The v1 patch rebases from {{trunk}} and resolve conflicts with [HADOOP-13351] in unit test. > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10326.000.patch, HDFS-10326.001.patch > > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10326: - Attachment: HDFS-10326.001.patch > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Mingliang Liu > Attachments: HDFS-10326.000.patch, HDFS-10326.001.patch > > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu reassigned HDFS-10326: Assignee: Mingliang Liu (was: Daryn Sharp) > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Mingliang Liu > Attachments: HDFS-10326.000.patch, HDFS-10326.001.patch > > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10326: - Assignee: Daryn Sharp (was: Mingliang Liu) > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10326.000.patch, HDFS-10326.001.patch > > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10614) Appended blocks can be closed even before IBRs from DataNodes
[ https://issues.apache.org/jira/browse/HDFS-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378492#comment-15378492 ] Akira Ajisaka commented on HDFS-10614: -- Looks good to me. +1 pending another committer's review. cc: [~yzhangal], [~jingzhao] > Appended blocks can be closed even before IBRs from DataNodes > - > > Key: HDFS-10614 > URL: https://issues.apache.org/jira/browse/HDFS-10614 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-10614.01.patch, HDFS-10614.02.patch > > > Scenario: >1. Open the file for append() >2. Trigger append pipeline setup by adding some data. >3. Consider RECEIVING IBRs of DNs reaches NN first. >4. updatePipeline() rpc sent to namenode to update the pipeline. >5. Now, if complete() is called on the file even before closing the > pipeline, then block will be COMPLETE, even before block is actually > FINALIZED at DN side and file will be closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378483#comment-15378483 ] Yongjun Zhang commented on HDFS-10587: -- I think this looks why we had new data reaching the new DN after the init block transfer: after adding the new DN to the pipeline, doing the block transfer to this new DN, the client resumed writing data. Then in the process, corruption is detected again, thus repeating the pipeline recovery process. Even though from client side point of view, it keeps getting the following exception {code} INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.1.1.1:1110 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) {code} Wei-Chiu and I discussed, and we think here is a more complete picture: * 1. pipeline going on DN1 -> DN2 -> DN3 * 2. trouble at DN3, it's gone * 3. pipeline recovery, new DN DN4 added * 4. block transfer from DN1 to DN4, DN4's data is now a multiple of chunks. * 5. DataStreamer resumed writing data to DN1 -> DN4 -> DN3 (this is where new data gets in), the first chunk DN4 got is corrupt for some reason that we are searching for * 6. DN3 detects corruption, quit; while new data has been written to DN1 and DN4 * 7. goes back to step 3, new pipeline recovery starts DN1 ->DN4 -> DN5 DN1 -> DN4 -> DN6 .. At a corner case, Step 3 could be replaced with "DN3 restarted", in which case, another block transfer would happen, and may cause corruption. Since DN1's visibleLength in step 4 is not a multiple of chunks, this fact might be somehow related to the corruption in step 5. > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally,
[jira] [Commented] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378466#comment-15378466 ] Hadoop QA commented on HDFS-10623: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 39s{color} | {color:green} branch-2.7.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} branch-2.7.3 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} branch-2.7.3 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} branch-2.7.3 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} branch-2.7.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} branch-2.7.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s{color} | {color:green} branch-2.7.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} branch-2.7.3 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} branch-2.7.3 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1382 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 37s{color} | {color:red} The patch 70 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 27s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_101. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 21s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | JDK v1.7.0_101 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.web.TestWebHdfsTokens | | | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:c42
[jira] [Commented] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378474#comment-15378474 ] Hadoop QA commented on HDFS-10628: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 38 unchanged - 0 fixed = 39 total (was 38) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 82m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.TestCrcCorruption | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818020/HDFS-10628.001.patch | | JIRA Issue | HDFS-10628 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux fe31dfa596fa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6cf0175 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16062/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16062/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16062/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16062/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT htt
[jira] [Commented] (HDFS-10601) Improve log message to include hostname when the NameNode is in safemode
[ https://issues.apache.org/jira/browse/HDFS-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378418#comment-15378418 ] Sean Busbey commented on HDFS-10601: +1 (non-binding) > Improve log message to include hostname when the NameNode is in safemode > > > Key: HDFS-10601 > URL: https://issues.apache.org/jira/browse/HDFS-10601 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: HDFS-10601.001.patch, HDFS-10601.002.patch > > > When remote NN operations are involved, it would be nice to have the Namenode > hostname in safemode notification log. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378367#comment-15378367 ] Jiayi Zhou commented on HDFS-10628: --- Versions updated as comment. > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10628.001.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhou updated HDFS-10628: -- Affects Version/s: 2.8.0 Target Version/s: 2.8.0 > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10628.001.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-10628: --- Priority: Minor (was: Major) Component/s: balancer & mover > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10628.001.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378365#comment-15378365 ] Andrew Wang commented on HDFS-10628: +1 pending Jenkins, thanks for finding and fixing this Jiayi! Do you mind setting the Affects and Target version appropriately? As this issue is pretty minor, I think we can just target 2.8.0. > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou > Attachments: HDFS-10628.001.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10627) Volume Scanner mark a block as "suspect" even if the block sender encounters 'Broken pipe' or 'Connection reset by peer' exception
[ https://issues.apache.org/jira/browse/HDFS-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378353#comment-15378353 ] Daryn Sharp commented on HDFS-10627: Rushabh and I checked a few random healthy nodes on multiple clusters. # They are backlogged with thousands of suspect blocks on all storages. It will take days to catch up – assuming no more false positives. # They been up for months but haven't started a rescan in over a month (available non-archived logs). So obviously the false positives are trickling in faster than the the scan rate. # 1 node reported one bad block in the past month. The others have not reported any bad blocks. # On a large cluster, 0.08% of pipeline recovery corruption was detected _The scanner is completely negligent in its duty to find and report bad blocks_. Rushabh added the priority scan feature to prevent rack failure from causing (hopefully) temporary data loss. Instead of waiting for up to week for a suspected corrupt block to be reported, it would be reported almost immediately. Well, guess what happened today? Rack failed. Lost data. DN knew the block was bad but was too backlogged to verify and report it. Completely avoidable situation not worth detecting 0.08% of pipeline recovery corruptions. *This is completely broken and must be reverted to be fixed*. > Volume Scanner mark a block as "suspect" even if the block sender encounters > 'Broken pipe' or 'Connection reset by peer' exception > -- > > Key: HDFS-10627 > URL: https://issues.apache.org/jira/browse/HDFS-10627 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > In the BlockSender code, > {code:title=BlockSender.java|borderStyle=solid} > if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection > reset")) { > LOG.error("BlockSender.sendChunks() exception: ", e); > } > datanode.getBlockScanner().markSuspectBlock( > volumeRef.getVolume().getStorageID(), > block); > {code} > Before HDFS-7686, the block was marked as suspect only if the exception > message doesn't start with Broken pipe or Connection reset. > But after HDFS-7686, the block is marked as corrupt irrespective of the > exception message. > In one of our datanode, it took approximately a whole day (22 hours) to go > through all the suspect blocks to scan one corrupt block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10631) Federation State Store ZooKeeper implementation
Inigo Goiri created HDFS-10631: -- Summary: Federation State Store ZooKeeper implementation Key: HDFS-10631 URL: https://issues.apache.org/jira/browse/HDFS-10631 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Inigo Goiri State Store implementation using ZooKeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378343#comment-15378343 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 75m 0s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817992/HDFS-10477.005.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ec06a4fc50cb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6cf0175 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16059/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16059/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.p
[jira] [Created] (HDFS-10630) Federation State Store
Inigo Goiri created HDFS-10630: -- Summary: Federation State Store Key: HDFS-10630 URL: https://issues.apache.org/jira/browse/HDFS-10630 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Inigo Goiri Interface to store the federation shared state across Routers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10629) Federation Router
Inigo Goiri created HDFS-10629: -- Summary: Federation Router Key: HDFS-10629 URL: https://issues.apache.org/jira/browse/HDFS-10629 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Inigo Goiri Component that routes calls from the clients to the right Namespace. It implements {{ClientProtocol}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378333#comment-15378333 ] Hadoop QA commented on HDFS-9271: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 48s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 47s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 10s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 55s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 22s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 40s{color} | {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed CTEST tests | test_libhdfs_threaded_hdfspp_test_shim_static | | JDK v1.7.0_101 Failed CTEST tests | test_libhdfs_threaded_hdfspp_test_shim_static | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818005/HDFS-9271.HDFS-8707.003.patch | | JIRA Issue | HDFS-9271 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 1952abccb1c6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / d18e396 | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | CTEST | https://builds.apache.org/job/PreCommit-HDFS-Build/16060/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_91-ctest.txt | | CTEST | https://builds.apache.org/job/PreCommit-HDFS-Build/16060/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_101-ctest.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16060/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_101.txt | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16060/testReport/ | | modules | C: hadoop-hdfs-project
[jira] [Updated] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhou updated HDFS-10628: -- Attachment: HDFS-10628.001.patch > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou > Attachments: HDFS-10628.001.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhou updated HDFS-10628: -- Status: Patch Available (was: Open) Simply log the exit message in Balancer log > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou > Attachments: HDFS-10628.001.patch > > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10628) Log HDFS Balancer exit message to its own log
[ https://issues.apache.org/jira/browse/HDFS-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhou updated HDFS-10628: -- Description: Currently, the exit message is logged to stderr. It would be more convenient if we also log this to Balancer log when people want to figure out why Balancer is aborted. (was: Currently, the exit message is logged to stderr. It would be better if we also log this to Balancer log when people want to figure out why Balancer is aborted.) > Log HDFS Balancer exit message to its own log > - > > Key: HDFS-10628 > URL: https://issues.apache.org/jira/browse/HDFS-10628 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou > > Currently, the exit message is logged to stderr. It would be more convenient > if we also log this to Balancer log when people want to figure out why > Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10628) Log HDFS Balancer exit message to its own log
Jiayi Zhou created HDFS-10628: - Summary: Log HDFS Balancer exit message to its own log Key: HDFS-10628 URL: https://issues.apache.org/jira/browse/HDFS-10628 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jiayi Zhou Assignee: Jiayi Zhou Currently, the exit message is logged to stderr. It would be better if we also log this to Balancer log when people want to figure out why Balancer is aborted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378293#comment-15378293 ] Hanisha Koneru commented on HDFS-10623: --- Thank you [~arpitagarwal] for reviewing and committing the patch. > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Fix For: 2.7.3 > > Attachments: HDFS-10623-branch-2.7.3.000.patch, HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10623: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Target Version/s: (was: 2.7.3) Status: Resolved (was: Patch Available) Committed to branch-2.7 and branch-2.7.3. Thank you for the contribution [~hanishakoneru]. > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Fix For: 2.7.3 > > Attachments: HDFS-10623-branch-2.7.3.000.patch, HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10627) Volume Scanner mark a block as "suspect" even if the block sender encounters 'Broken pipe' or 'Connection reset by peer' exception
[ https://issues.apache.org/jira/browse/HDFS-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378278#comment-15378278 ] Daryn Sharp commented on HDFS-10627: I agree it's unfortunate there's no feedback mechanism. I disagree the serving node should ever assume that loosing a client means the block _might_ be corrupt. There are many reasons the client can "unexpectedly" close the connection. Processes shutdown unexpectedly, get killed, etc. The only time the DN should be suspect of its own block is a local IOE. I checked some of our big clusters: a DN self-reporting a corrupt block during pipeline/block recovery is a fraction of a percent (which is being generous). So let's consider... Is it worth backlogging a DN with so many false positives from broken connections that: # It takes a day to scan a legitimately bad block detected by a local IOE # Leading to a rack failure causing temporary data loss # Scanner isn't doing it's primary job of trawling the storage for bad blocks > Volume Scanner mark a block as "suspect" even if the block sender encounters > 'Broken pipe' or 'Connection reset by peer' exception > -- > > Key: HDFS-10627 > URL: https://issues.apache.org/jira/browse/HDFS-10627 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > In the BlockSender code, > {code:title=BlockSender.java|borderStyle=solid} > if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection > reset")) { > LOG.error("BlockSender.sendChunks() exception: ", e); > } > datanode.getBlockScanner().markSuspectBlock( > volumeRef.getVolume().getStorageID(), > block); > {code} > Before HDFS-7686, the block was marked as suspect only if the exception > message doesn't start with Broken pipe or Connection reset. > But after HDFS-7686, the block is marked as corrupt irrespective of the > exception message. > In one of our datanode, it took approximately a whole day (22 hours) to go > through all the suspect blocks to scan one corrupt block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378245#comment-15378245 ] Arpit Agarwal commented on HDFS-10623: -- +1 this doesn't need a full Jenkins run. The test builds and passes with the patch so I will commit it shortly. > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623-branch-2.7.3.000.patch, HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-9271: Attachment: HDFS-9271.HDFS-8707.003.patch Please review the new patch. > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Anatoli Shein > Attachments: HDFS-9271.HDFS-8707.000.patch, > HDFS-9271.HDFS-8707.001.patch, HDFS-9271.HDFS-8707.002.patch, > HDFS-9271.HDFS-8707.003.patch > > > Expose via C and C++ API: > * mkdirs > * rename > * delete > * stat > * chmod > * chown > * getListing > * setOwner -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378193#comment-15378193 ] Anatoli Shein commented on HDFS-9271: - Thank you [~bobhansen] for your comments. Here is what I did: A few comments: * In GetBlockLocations(hdfspp.h, filesystem.cc), use offset_t or uint64_t rather than long. It's less ambiguous. (/) I used uint64_t and also updated short variables to uint16_t * In getAbsolutePath (hdfs.cc), how about returning optional(string) rather than an empty string on error. It makes the error state explicit and explicitly checked. (/) Done * Make a new bug to capture supporting ".." semantics (/) Bug filed: HDFS-10621. * It appears the majority of hdfs_ext_test.c has been commented out. What this intentional, or debugging dirt that slipped in? (/) Debugging dirt removed * Can we add a test for relative paths for all the functions where we added them in? (/) Added this to hdfs_ext_test.c * Can we implement hdfsMove and/or hdfsTruncateFile with just metadata operations? (x) As per our conversation today, this cannot be done yet * Move to libhdfspp implementations in hdfs_shim for GetDefaultBlocksize\[AtPath] (/) Done * Implement hdfsUnbufferFile as a no-op? (/) Done * Do we support single-dot relative paths? e.g. can I call hdfsGetPathInfo(fs, ".")? Do we have tests over that? (x) We do not support this yet. It is captured in HDFS-10621. * Do we have tests that show that libhdfspp's getReadStatistics match libhdfs's getReadStatistics? (x) We do not track which bytes were remote or local. Temporary assume everything is local. Minor little nits: * For the absolute path, I personally prefer abs_path = getAbsolutPath(...) rather than abs_path(getAbsolutePath). They both compile to the same thing (see https://en.wikipedia.org/wiki/Return_value_optimization); I think the whitespace with the assignment makes the what and the content separation cleaner (/) Done * Refactor CheckSystemAndHandle to use CheckHandle (https://en.wikipedia.org/wiki/Don't_repeat_yourself) (/) Done > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Anatoli Shein > Attachments: HDFS-9271.HDFS-8707.000.patch, > HDFS-9271.HDFS-8707.001.patch, HDFS-9271.HDFS-8707.002.patch > > > Expose via C and C++ API: > * mkdirs > * rename > * delete > * stat > * chmod > * chown > * getListing > * setOwner -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-10623: -- Status: Patch Available (was: In Progress) > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623-branch-2.7.3.000.patch, HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-10623: -- Status: In Progress (was: Patch Available) > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623-branch-2.7.3.000.patch, HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-10623: -- Attachment: HDFS-10623-branch-2.7.3.000.patch > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623-branch-2.7.3.000.patch, HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8897 started by John Zhuge. > Loadbalancer always exits with : java.io.IOException: Another Balancer is > running.. Exiting ... > > > Key: HDFS-8897 > URL: https://issues.apache.org/jira/browse/HDFS-8897 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.1 > Environment: Centos 6.6 >Reporter: LINTE >Assignee: John Zhuge > > When balancer is launched, it should test if there is already a > /system/balancer.id file in HDFS. > When the file doesn't exist, the balancer don't want to run : > 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, > hdfs://sandbox] > 15/08/14 16:35:12 INFO balancer.Balancer: parameters = > Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration > = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move > Bytes Being Moved > 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys > 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys > 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys > 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > java.io.IOException: Another Balancer is running.. Exiting ... > Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds > Looking at the audit log file when trying to run the balancer, the balancer > create the /system/balancer.id and then delete it on exiting ... > 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create > src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- > proto=rpc > 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete > src=/system/balancer.id dst=nullperm=null proto=rpc > The error seems to be located in > org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java > The function checkAndMarkRunning return null even if the /system/balancer.id > doesn't exist before entering this function; if it exists, then it is deleted > and the balancer exit with the same error. > > private OutputStream checkAndMarkRunning() throws IOException { > try { > if (fs.exists(idPath)) { > // try appending to it so that it will fail fast if another balancer > is > // running. > IOUtils.closeStream(fs.append(idPath)); > fs.delete(idPath, true); > } > final FSDataOutputStream fsout = fs.create(idPath, false); > // mark balancer idPath to be deleted during filesystem closure > fs.deleteOnExit(idPath); > if (write2IdFile) { > fsout.writeBytes(InetAddress.getLocalHost().getHostName()); > fsout.hflush(); > } > return fsout; > } catch(RemoteException e) { > > if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ > return null; > } else { > throw e; > } > } > } > > Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-
[jira] [Assigned] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge reassigned HDFS-8897: Assignee: John Zhuge > Loadbalancer always exits with : java.io.IOException: Another Balancer is > running.. Exiting ... > > > Key: HDFS-8897 > URL: https://issues.apache.org/jira/browse/HDFS-8897 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.1 > Environment: Centos 6.6 >Reporter: LINTE >Assignee: John Zhuge > > When balancer is launched, it should test if there is already a > /system/balancer.id file in HDFS. > When the file doesn't exist, the balancer don't want to run : > 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, > hdfs://sandbox] > 15/08/14 16:35:12 INFO balancer.Balancer: parameters = > Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration > = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move > Bytes Being Moved > 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys > 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys > 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys > 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > java.io.IOException: Another Balancer is running.. Exiting ... > Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds > Looking at the audit log file when trying to run the balancer, the balancer > create the /system/balancer.id and then delete it on exiting ... > 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create > src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- > proto=rpc > 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo > src=/system/balancer.id dst=nullperm=null proto=rpc > 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true > ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete > src=/system/balancer.id dst=nullperm=null proto=rpc > The error seems to be located in > org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java > The function checkAndMarkRunning return null even if the /system/balancer.id > doesn't exist before entering this function; if it exists, then it is deleted > and the balancer exit with the same error. > > private OutputStream checkAndMarkRunning() throws IOException { > try { > if (fs.exists(idPath)) { > // try appending to it so that it will fail fast if another balancer > is > // running. > IOUtils.closeStream(fs.append(idPath)); > fs.delete(idPath, true); > } > final FSDataOutputStream fsout = fs.create(idPath, false); > // mark balancer idPath to be deleted during filesystem closure > fs.deleteOnExit(idPath); > if (write2IdFile) { > fsout.writeBytes(InetAddress.getLocalHost().getHostName()); > fsout.hflush(); > } > return fsout; > } catch(RemoteException e) { > > if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ > return null; > } else { > throw e; > } > } > } > > Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: h
[jira] [Updated] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-10477: - Attachment: HDFS-10477.005.patch Update patch to fix unit test. When called by tests like TestDefaultBlockPlacementPolicy.testPlacementWithLocalRackNodesDecommissioned, it might not have write lock. Thanks [~rakeshr] for suggestion on the InterruptedException. Handler thread is daemon thread, but you are right, it's better call Thread.currentThread().interrupt() to keep interrupt status. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop
[jira] [Commented] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378101#comment-15378101 ] Hadoop QA commented on HDFS-10623: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HDFS-10623 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817991/HDFS-10623.000.patch | | JIRA Issue | HDFS-10623 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16058/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-10623: -- Status: Patch Available (was: Open) > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-10623: -- Attachment: HDFS-10623.000.patch > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > Attachments: HDFS-10623.000.patch > > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10623) Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens.
[ https://issues.apache.org/jira/browse/HDFS-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-10623: -- Target Version/s: 2.7.3 > Remove unused import of httpclient.HttpConnection from TestWebHdfsTokens. > - > > Key: HDFS-10623 > URL: https://issues.apache.org/jira/browse/HDFS-10623 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru > > TestWebHdfsTokens imports httpclient.HttpConnection, and causes unnecessary > reference to httpclient. This can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7686) Re-add rapid rescan of possibly corrupt block feature to the block scanner
[ https://issues.apache.org/jira/browse/HDFS-7686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7686: --- Assignee: Colin P. McCabe (was: Rakesh R) > Re-add rapid rescan of possibly corrupt block feature to the block scanner > -- > > Key: HDFS-7686 > URL: https://issues.apache.org/jira/browse/HDFS-7686 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Colin P. McCabe >Priority: Blocker > Fix For: 2.7.0 > > Attachments: HDFS-7686.002.patch, HDFS-7686.003.patch, > HDFS-7686.004.patch > > > When doing a transferTo (aka sendfile operation) from the DataNode to a > client, we may hit an I/O error from the disk. If we believe this is the > case, we should be able to tell the block scanner to rescan that block soon. > The feature was originally implemented in HDFS-7548 but was removed by > HDFS-7430. We should re-add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-7686) Re-add rapid rescan of possibly corrupt block feature to the block scanner
[ https://issues.apache.org/jira/browse/HDFS-7686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reassigned HDFS-7686: -- Assignee: Rakesh R (was: Colin P. McCabe) > Re-add rapid rescan of possibly corrupt block feature to the block scanner > -- > > Key: HDFS-7686 > URL: https://issues.apache.org/jira/browse/HDFS-7686 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rakesh R >Priority: Blocker > Fix For: 2.7.0 > > Attachments: HDFS-7686.002.patch, HDFS-7686.003.patch, > HDFS-7686.004.patch > > > When doing a transferTo (aka sendfile operation) from the DataNode to a > client, we may hit an I/O error from the disk. If we believe this is the > case, we should be able to tell the block scanner to rescan that block soon. > The feature was originally implemented in HDFS-7548 but was removed by > HDFS-7430. We should re-add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377889#comment-15377889 ] Hudson commented on HDFS-10600: --- SUCCESS: Integrated in Hadoop-trunk-Commit #10100 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10100/]) HDFS-10600. PlanCommand#getThrsholdPercentage should not use throughput (lei: rev 382dff74751b745de28a212df4897f525111d228) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/command/PlanCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DiskBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10600.001.patch, HDFS-10600.002.patch > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10467) Router-based HDFS federation
[ https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377362#comment-15377362 ] Jing Zhao commented on HDFS-10467: -- Assigned the jira to [~elgoiri]. bq. Probably, it's a good idea to create a new branch for this effort. +1. I've created the feature branch HDFS-10467. Please feel free to use it for next step development. > Router-based HDFS federation > > > Key: HDFS-10467 > URL: https://issues.apache.org/jira/browse/HDFS-10467 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Affects Versions: 2.7.2 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch, > HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch > > > Add a Router to provide a federated view of multiple HDFS clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-10600: - Resolution: Fixed Fix Version/s: (was: 2.9.0) 3.0.0-alpha1 Target Version/s: 3.0.0-beta1 (was: 2.9.0, 3.0.0-beta1) Status: Resolved (was: Patch Available) +1. Thanks for the fix, [~linyiqun]. Committed to trunk. > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10600.001.patch, HDFS-10600.002.patch > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10467) Router-based HDFS federation
[ https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-10467: - Assignee: Inigo Goiri > Router-based HDFS federation > > > Key: HDFS-10467 > URL: https://issues.apache.org/jira/browse/HDFS-10467 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Affects Versions: 2.7.2 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch, > HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch > > > Add a Router to provide a federated view of multiple HDFS clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-10626: - Labels: supportability (was: ) > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377289#comment-15377289 ] Yongjun Zhang commented on HDFS-10626: -- HI [~linyiqun], Thanks for reporting and working on the issue. I made a similar comment here https://issues.apache.org/jira/browse/HDFS-10625?focusedCommentId=15377247&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15377247 I think it's helpful to report more info about the replica when there is any issue. Thanks. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10626.001.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10627) Volume Scanner mark a block as "suspect" even if the block sender encounters 'Broken pipe' or 'Connection reset by peer' exception
[ https://issues.apache.org/jira/browse/HDFS-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377268#comment-15377268 ] Wei-Chiu Chuang commented on HDFS-10627: [~shahrs87] this piece of code can be useful during block transfer due to prior pipeline recovery. In which case, if the receiver detects corruption in the replica, it immediately resets connection. If block transfer is not initiated due to pipeline recovery, the receiver also notifies NameNode that the source's replica is corrupt (this is actually not accurate because the corruption may due to other issues, and which causes the bug described in HDFS-6804). In short, I think this code is still necessary, because of the lack of feedback mechanism in block transfer during pipeline recovery. > Volume Scanner mark a block as "suspect" even if the block sender encounters > 'Broken pipe' or 'Connection reset by peer' exception > -- > > Key: HDFS-10627 > URL: https://issues.apache.org/jira/browse/HDFS-10627 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > In the BlockSender code, > {code:title=BlockSender.java|borderStyle=solid} > if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection > reset")) { > LOG.error("BlockSender.sendChunks() exception: ", e); > } > datanode.getBlockScanner().markSuspectBlock( > volumeRef.getVolume().getStorageID(), > block); > {code} > Before HDFS-7686, the block was marked as suspect only if the exception > message doesn't start with Broken pipe or Connection reset. > But after HDFS-7686, the block is marked as corrupt irrespective of the > exception message. > In one of our datanode, it took approximately a whole day (22 hours) to go > through all the suspect blocks to scan one corrupt block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377247#comment-15377247 ] Yongjun Zhang edited comment on HDFS-10625 at 7/14/16 4:44 PM: --- Thanks [~shahrs87] for the patch and [~jojochuang] for the comment. It'd be nice to include the length of the replica, visible length, ondisk length etc in the report too. Suggest to use the same method used here (see HDFS-10587), {code} 016-04-15 22:03:05,066 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW getNumBytes() = 41381376 getBytesOnDisk() = 41381376 getVisibleLength()= 41186444 getVolume() = /hadoop-i/data/current getBlockFile() = /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 bytesAcked=41186444 bytesOnDisk=41381376 {code} was (Author: yzhangal): Thanks [~shahrs87] for the patch and [~jojochuang] for the comment. It'd be nice to include the length of the replica in the report too. > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may report: > {code} > WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > blk_1170125248_96458336 on /d/dfs/dn > {code} > It would be helpful to report the reason why the block is bad, especially > when the block is corrupt, where is the first corrupted chunk in the block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10627) Volume Scanner mark a block as "suspect" even if the block sender encounters 'Broken pipe' or 'Connection reset by peer' exception
[ https://issues.apache.org/jira/browse/HDFS-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10627: -- Description: In the BlockSender code, {code:title=BlockSender.java|borderStyle=solid} if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection reset")) { LOG.error("BlockSender.sendChunks() exception: ", e); } datanode.getBlockScanner().markSuspectBlock( volumeRef.getVolume().getStorageID(), block); {code} Before HDFS-7686, the block was marked as suspect only if the exception message doesn't start with Broken pipe or Connection reset. But after HDFS-7686, the block is marked as corrupt irrespective of the exception message. In one of our datanode, it took approximately a whole day (22 hours) to go through all the suspect blocks to scan one corrupt block. was: In the BlockSender code, {code:title=BlockSender.java|borderStyle=solid} if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection reset")) { LOG.error("BlockSender.sendChunks() exception: ", e); } datanode.getBlockScanner().markSuspectBlock( volumeRef.getVolume().getStorageID(), block); {code} Before HDFS-7686, the block was marked as suspect only if the exception message doesn't start with Broken pipe or Connection reset. But after HDFS-7686, the block is marked as corrupt irrespectively of the exception message. In one of our datanode, it took approximately a whole day (22 hours) to go through all the suspect blocks to scan one corrupt block. > Volume Scanner mark a block as "suspect" even if the block sender encounters > 'Broken pipe' or 'Connection reset by peer' exception > -- > > Key: HDFS-10627 > URL: https://issues.apache.org/jira/browse/HDFS-10627 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > In the BlockSender code, > {code:title=BlockSender.java|borderStyle=solid} > if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection > reset")) { > LOG.error("BlockSender.sendChunks() exception: ", e); > } > datanode.getBlockScanner().markSuspectBlock( > volumeRef.getVolume().getStorageID(), > block); > {code} > Before HDFS-7686, the block was marked as suspect only if the exception > message doesn't start with Broken pipe or Connection reset. > But after HDFS-7686, the block is marked as corrupt irrespective of the > exception message. > In one of our datanode, it took approximately a whole day (22 hours) to go > through all the suspect blocks to scan one corrupt block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10627) Volume Scanner mark a block as "suspect" even if the block sender encounters 'Broken pipe' or 'Connection reset by peer' exception
Rushabh S Shah created HDFS-10627: - Summary: Volume Scanner mark a block as "suspect" even if the block sender encounters 'Broken pipe' or 'Connection reset by peer' exception Key: HDFS-10627 URL: https://issues.apache.org/jira/browse/HDFS-10627 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.7.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah In the BlockSender code, {code:title=BlockSender.java|borderStyle=solid} if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection reset")) { LOG.error("BlockSender.sendChunks() exception: ", e); } datanode.getBlockScanner().markSuspectBlock( volumeRef.getVolume().getStorageID(), block); {code} Before HDFS-7686, the block was marked as suspect only if the exception message doesn't start with Broken pipe or Connection reset. But after HDFS-7686, the block is marked as corrupt irrespectively of the exception message. In one of our datanode, it took approximately a whole day (22 hours) to go through all the suspect blocks to scan one corrupt block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377247#comment-15377247 ] Yongjun Zhang commented on HDFS-10625: -- Thanks [~shahrs87] for the patch and [~jojochuang] for the comment. It'd be nice to include the length of the replica in the report too. > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may report: > {code} > WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > blk_1170125248_96458336 on /d/dfs/dn > {code} > It would be helpful to report the reason why the block is bad, especially > when the block is corrupt, where is the first corrupted chunk in the block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377228#comment-15377228 ] Wei-Chiu Chuang commented on HDFS-10625: looks good to me. VolumeScanner uses BlockSender and a null stream to verify the integrity of the block. If the block is corrupt, {{BlockSender#verifyChecksum}} throws a ChecksumException, which details the location of the corruption. {code} throw new ChecksumException("Checksum failed at " + failedPos, failedPos); {code} > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may report: > {code} > WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > blk_1170125248_96458336 on /d/dfs/dn > {code} > It would be helpful to report the reason why the block is bad, especially > when the block is corrupt, where is the first corrupted chunk in the block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted
[ https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377213#comment-15377213 ] James Clampffer commented on HDFS-10596: {code} 16/07/05 17:12:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable {code} Usually this warning isn't really a big deal, it's just not able to find libhadoop.so which provides some some functionallity that need to be done in native code e.g. short circuit reads, optimized crc implementations (crc may be back in java now) etc. I don't know much about the security stuff but this doesn't look related to the main issue. Regarding the RemoteException - Have you tried looking at the namenode logs (turn logging level to debug if possible) to see if it's actually getting the right info about the key service? Possibly also look at the key service logs if applicable. There might be some useful warnings in there that could help point you in the right direction. > libhdfs++: Implement hdfsFileIsEncrypted > > > Key: HDFS-10596 > URL: https://issues.apache.org/jira/browse/HDFS-10596 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein > Attachments: HDFS-10596.HDFS-8707.000.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377192#comment-15377192 ] Hadoop QA commented on HDFS-10625: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 7s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 96m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817952/HDFS-10625.patch | | JIRA Issue | HDFS-10625 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 8fc773df0d4f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 54bf14f | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16057/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16057/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may repor
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377160#comment-15377160 ] Yongjun Zhang commented on HDFS-10587: -- Thanks a lot [~vinayrpet] and [~xupener]! As Vinay pointed, the case Xupeng described looks alike but the corruption position is not like this case. I think HDFS-6937 will help on Xupeng's case. Vinay: About {{recoverRbw}}, since the data the destination DN (the new DN) received is valid data, does not truncating at the new DN hurt? We actually allow different visibleLength at different replica, see https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15374480&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15374480 Though I originally hopeed that the block transfer preserve the visibleLength, so that in the block transfer, the target DN can have the same visibleLength as the source DN. Assuming it's ok to have the different visibleLength at the new DN, the block transfer seems to have side effect, such that the new chunk after the block transfer at the new DN appears corrupted. Another thing is, if the pipeline recovery is failing, see https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376467&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376467 why we have more data reaching the new DN (I meant the chunk after block transfer) ? Thanks. > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the re
[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377045#comment-15377045 ] xupeng commented on HDFS-10328: --- [~cmccabe] Thanks a lot for the review :) > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.9.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, > HDFS-10328.003.patch, HDFS-10328.004.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah reassigned HDFS-10625: - Assignee: Rushabh S Shah > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may report: > {code} > WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > blk_1170125248_96458336 on /d/dfs/dn > {code} > It would be helpful to report the reason why the block is bad, especially > when the block is corrupt, where is the first corrupted chunk in the block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10625: -- Attachment: HDFS-10625.patch > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may report: > {code} > WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > blk_1170125248_96458336 on /d/dfs/dn > {code} > It would be helpful to report the reason why the block is bad, especially > when the block is corrupt, where is the first corrupted chunk in the block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10625) VolumeScanner to report why a block is found bad
[ https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10625: -- Status: Patch Available (was: Open) Please review. > VolumeScanner to report why a block is found bad > - > > Key: HDFS-10625 > URL: https://issues.apache.org/jira/browse/HDFS-10625 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Rushabh S Shah > Labels: supportability > Attachments: HDFS-10625.patch > > > VolumeScanner may report: > {code} > WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > blk_1170125248_96458336 on /d/dfs/dn > {code} > It would be helpful to report the reason why the block is bad, especially > when the block is corrupt, where is the first corrupted chunk in the block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376964#comment-15376964 ] Rushabh S Shah commented on HDFS-10626: --- ltgm +1 (non binding) > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10626.001.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376940#comment-15376940 ] Vinayakumar B commented on HDFS-10587: -- bq. org.apache.hadoop.fs.ChecksumException: Checksum error: DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895 Here it says checksum error at 81920, which is at the very beginning itself. May be 229 disk have some problem, or during transfer to 77 some corruption due to network card would have happened. Is not exactly same as current case. > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver has the replica as follows: > 2016-04-15 22:03:05,068 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41186816 > getBytesOnDisk() = 41186816 > getVisibleLength()= 41186816 > getVolume() = /hadoop-g/data/current > getBlockFile()= > /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186816 > bytesOnDisk=41186816 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376941#comment-15376941 ] Hadoop QA commented on HDFS-10626: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 60m 44s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817934/HDFS-10626.001.patch | | JIRA Issue | HDFS-10626 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f07ba19e828c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be26c1b | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16056/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16056/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10626.001.patch > > > VolumeScanner throws incorrect IOEx
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376897#comment-15376897 ] xupeng commented on HDFS-10587: --- hi [~vinayrpet] : logs related are listed below 134.228 {noformat} 2016-07-13 11:48:29,528 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted blk_1116167880_42905642 (numBytes=9911790) to /10.6.134.229:5080 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_1116167880_42905642 src: /10.6.130.44:26319 dest: /10.6.134.228:5080 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replicablk_1116167880_42905642 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1116167880_42905642, RBW getNumBytes() = 9912487 getBytesOnDisk() = 9912487 getVisibleLength()= 9911790 getVolume() = /current getBlockFile()= /current/current/rbw/blk_1116167880 bytesAcked=9911790 bytesOnDisk=9912487 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: truncateBlock: blockFile=/current/current/rbw/blk_1116167880, metaFile=/current/current/rbw/blk_1116167880_42905642.meta, oldlen=9912487, newlen=9911790 016-07-13 11:49:01,566 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_1116167880_42906656 src: /10.6.130.44:26617 dest: /10.6.134.228:5080 2016-07-13 11:49:01,566 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica blk_1116167880_42906656 2016-07-13 11:49:01,566 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1116167880_42906656, RBW getNumBytes() = 15104963 getBytesOnDisk() = 15104963 getVisibleLength()= 15102415 getVolume() = /current getBlockFile()= /current/current/rbw/blk_1116167880 bytesAcked=15102415 bytesOnDisk=15104963 2016-07-13 11:49:01,566 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: truncateBlock: blockFile=/current/rbw/blk_1116167880, metaFile=/current/rbw/blk_1116167880_42906656.meta, oldlen=15104963, newlen=15102415 2016-07-13 11:49:01,569 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as 10.6.129.77:5080 2016-07-13 11:49:01,569 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 10.6.129.77:5080 2016-07-13 11:49:01,570 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: blk_1116167880_42907145, type=HAS_DOWNSTREAM_IN_PIPELINE java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2225) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1179) at java.lang.Thread.run(Thread.java:745) 2016-07-13 11:49:01,570 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for blk_1116167880_42907145 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) {noformat} 134.229 {noformat} 2016-07-13 11:48:29,488 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_1116167880_42905642 src: /10.6.134.228:24286 dest: /10.6.134.229:5080 2016-07-13 11:48:29,516 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Convert blk_1116167880_42905642 from Temporary to RBW, visible length=9912320 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_1116167880_42905642 src: /10.6.134.228:24321 dest: /10.6.134.229:5080 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica blk_1116167880_42905642 2016-07-13 11:48:29,552 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1116167880_42905642, RBW getNumBytes() = 9912320 getBytesOnDisk() = 9912320 getVisibleLength()= 9912320 getVolume() = /current getBlockFile()= /current/rbw/blk_1116167880 bytesAcked=9912320 bytesOnDisk=9912320 2016-07-13 11:49:01,501 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: blk_1116167880_42906656, type=HAS_DOWNSTREAM_IN_PIPELINE java.io.IOException: Connection reset by peer 016-07-13 11:49:01,505 IN
[jira] [Issue Comment Deleted] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xupeng updated HDFS-10587: -- Comment: was deleted (was: And here are the logs : Hbase log {noformat} 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD], DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] 2016-07-13 11:49:01,499 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 from datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:49:01,500 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.134.229:5080,DS-8c209fca-9b34-4a6b-919b-6b4d24a3e13a,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] 2016-07-13 11:49:01,566 INFO [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.6.129.77:5080 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) {noformat} 10.6.128.215Log {noformat} 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 src: /10.6.134.229:19009 dest: /10.6.128.215:5080 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1116167880_42905642, RBW getNumBytes() = 9912487 getBytesOnDisk() = 9912487 getVisibleLength()= 9911790 getVolume() = /data12/yarn/dndata/current getBlockFile()= /data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880 bytesAcked=9911790 bytesOnDisk=9912487 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: truncateBlock: blockFile=/data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880, metaFile=/data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880_42905642.meta, oldlen=99124
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10626: - Description: VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. The related codes: {code} public void handle(ExtendedBlock block, IOException e) { FsVolumeSpi volume = scanner.volume; ... try { scanner.datanode.reportBadBlocks(block, volume); } catch (IOException ie) { // This is bad, but not bad enough to shut down the scanner. LOG.warn("Cannot report bad " + block.getBlockId(), e); } } {code} The IOException that printed in the log should be {{ie}} rather than {{e}} which was passed in method {{handle(ExtendedBlock block, IOException e)}}. It will be a important info that can help us to know why datanode reporBadBlocks failed. was: VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. The related codes: {code} public void handle(ExtendedBlock block, IOException e) { FsVolumeSpi volume = scanner.volume; ... try { scanner.datanode.reportBadBlocks(block, volume); } catch (IOException ie) { // This is bad, but not bad enough to shut down the scanner. LOG.warn("Cannot report bad " + block.getBlockId(), e); } } {code} The IOException that printed in the log should be {{ie}} rather than {{e}} which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10626.001.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10626: - Attachment: HDFS-10626.001.patch Attach a simple patch to do a minor change. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10626.001.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10626: - Status: Patch Available (was: Open) > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10626: - Priority: Minor (was: Major) > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
Yiqun Lin created HDFS-10626: Summary: VolumeScanner prints incorrect IOException in reportBadBlocks operation Key: HDFS-10626 URL: https://issues.apache.org/jira/browse/HDFS-10626 Project: Hadoop HDFS Issue Type: Bug Reporter: Yiqun Lin Assignee: Yiqun Lin VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. The related codes: {code} public void handle(ExtendedBlock block, IOException e) { FsVolumeSpi volume = scanner.volume; ... try { scanner.datanode.reportBadBlocks(block, volume); } catch (IOException ie) { // This is bad, but not bad enough to shut down the scanner. LOG.warn("Cannot report bad " + block.getBlockId(), e); } } {code} The IOException that printed in the log should be {{ie}} rather than {{e}} which was passed in method {{handle(ExtendedBlock block, IOException e)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376794#comment-15376794 ] Vinayakumar B commented on HDFS-10587: -- Hi [~xupeng], Can you add 229 and 77 logs as well for this block? Including transfer related logs. > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver has the replica as follows: > 2016-04-15 22:03:05,068 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41186816 > getBytesOnDisk() = 41186816 > getVisibleLength()= 41186816 > getVolume() = /hadoop-g/data/current > getBlockFile()= > /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186816 > bytesOnDisk=41186816 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376739#comment-15376739 ] xupeng edited comment on HDFS-10587 at 7/14/16 10:54 AM: - Hi all : I encountered the same issue, here is the scenario : - hbase writing block to pipeline 10.6.134.228 , 10.6.128.215, 10.6.128.208 - dn - 10.6.128.208 restarted - pipeline recovery, add new datanode - 10.6.134.229 to pipeline. - client send transfer_block command , and 10.6.134.228 copy the block file to new data node 10.6.134.229 - client writing data - datanode 10.6.128.215 restarted - pipeline recovery, add new datanode - 10.6.129.77 to pipeline. - client send transfer_block command , and 10.6.134.229 copy the block file to new data node 10.6.129.77 - 129.77 throws "error.java.io.IOException: Unexpected checksum mismatch" was (Author: xupener): Hi all : I encountered the same issue, here is the scenario : a. hbase writing block to pipeline 10.6.134.228 , 10.6.128.215, 10.6.128.208 b. dn - 10.6.128.208 restarted c. pipeline recovery, add new datanode - 10.6.134.229 to pipeline. d. client send transfer_block command , and 10.6.134.228 copy the block file to new data node 10.6.134.229 e. client writing data f. datanode 10.6.128.215 restarted g. pipeline recovery, add new datanode - 10.6.129.77 to pipeline. h. 129.77 throws "error.java.io.IOException: Unexpected checksum mismatch" > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver
[jira] [Comment Edited] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376750#comment-15376750 ] xupeng edited comment on HDFS-10587 at 7/14/16 10:52 AM: - And here are the logs : Hbase log {noformat} 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD], DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] 2016-07-13 11:49:01,499 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 from datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:49:01,500 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.134.229:5080,DS-8c209fca-9b34-4a6b-919b-6b4d24a3e13a,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] 2016-07-13 11:49:01,566 INFO [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.6.129.77:5080 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) {noformat} 10.6.128.215Log {noformat} 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 src: /10.6.134.229:19009 dest: /10.6.128.215:5080 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1116167880_42905642, RBW getNumBytes() = 9912487 getBytesOnDisk() = 9912487 getVisibleLength()= 9911790 getVolume() = /data12/yarn/dndata/current getBlockFile()= /data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880 bytesAcked=9911790 bytesOnDisk=9912487 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: truncateBlock: blockFile=/data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880, metaFile=/data12/yarn/dndata/current/BP-448958278-10.6
[jira] [Comment Edited] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376750#comment-15376750 ] xupeng edited comment on HDFS-10587 at 7/14/16 10:49 AM: - And here are the logs : Hbase log {noformat} 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD], DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] 2016-07-13 11:49:01,499 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 from datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:49:01,500 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.134.229:5080,DS-8c209fca-9b34-4a6b-919b-6b4d24a3e13a,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] 2016-07-13 11:49:01,566 INFO [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.6.129.77:5080 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) {noformat} was (Author: xupener): And here are the logs : Hbase log -- 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:508
[jira] [Comment Edited] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376750#comment-15376750 ] xupeng edited comment on HDFS-10587 at 7/14/16 10:47 AM: - And here are the logs : Hbase log -- 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD], DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] 2016-07-13 11:49:01,499 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 from datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:49:01,500 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.134.229:5080,DS-8c209fca-9b34-4a6b-919b-6b4d24a3e13a,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] 2016-07-13 11:49:01,566 INFO [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.6.129.77:5080 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) was (Author: xupener): And here are the logs : Hbase log -- 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376750#comment-15376750 ] xupeng commented on HDFS-10587: --- And here are the logs : Hbase log -- 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD], DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] 016-07-13 11:49:01,499 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 from datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:49:01,500 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.134.229:5080,DS-8c209fca-9b34-4a6b-919b-6b4d24a3e13a,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] 2016-07-13 11:49:01,566 INFO [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.6.129.77:5080 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. >
[jira] [Commented] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption
[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376739#comment-15376739 ] xupeng commented on HDFS-10587: --- Hi all : I encountered the same issue, here is the scenario : a. hbase writing block to pipeline 10.6.134.228 , 10.6.128.215, 10.6.128.208 b. dn - 10.6.128.208 restarted c. pipeline recovery, add new datanode - 10.6.134.229 to pipeline. d. client send transfer_block command , and 10.6.134.228 copy the block file to new data node 10.6.134.229 e. client writing data f. datanode 10.6.128.215 restarted g. pipeline recovery, add new datanode - 10.6.129.77 to pipeline. h. 129.77 throws "error.java.io.IOException: Unexpected checksum mismatch" > Incorrect offset/length calculation in pipeline recovery causes block > corruption > > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile()= > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver has the replica as follows: > 2016-04-15 22:03:05,068 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41186816 > getBytesOnDisk() = 41186816 > getVisibleLength()= 41186816 > getVolume() = /hadoop-g/data/current > getBlockFile()= > /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186816 > bytesOnDisk=41186816 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Commented] (HDFS-10570) Remove classpath conflicts of netty-all jar in hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376706#comment-15376706 ] Hadoop QA commented on HDFS-10570: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 55s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 5s{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5af2af1 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817915/HDFS-10570-branch-2.8-02.patch | | JIRA Issue | HDFS-10570 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 817aa2a2a808 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.8 / cbd885b | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16055/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16055/console | | Powered by
[jira] [Commented] (HDFS-10570) Remove classpath conflicts of netty-all jar in hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376679#comment-15376679 ] Hudson commented on HDFS-10570: --- SUCCESS: Integrated in Hadoop-trunk-Commit #10097 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10097/]) HDFS-10570. Remove classpath conflicts of netty-all jar in (stevel: rev 1fa1fab695355fadb7898efb6d0d03fc88513466) * hadoop-hdfs-project/hadoop-hdfs-client/pom.xml > Remove classpath conflicts of netty-all jar in hadoop-hdfs-client > - > > Key: HDFS-10570 > URL: https://issues.apache.org/jira/browse/HDFS-10570 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Attachments: HDFS-10570-01.patch, HDFS-10570-02.patch, > HDFS-10570-branch-2.8-02.patch > > > While debugging tests in eclipse, Cannot access DN http url. > Also WebHdfs tests cannot run in eclipse due to classes loading from old > version of netty jars instead of netty-all jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10570) Remove classpath conflicts of netty-all jar in hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376650#comment-15376650 ] Steve Loughran commented on HDFS-10570: --- +1, committed to trunk. I haven't yet cherry picked it to branch-2; adding a patch for yetus to test there before dong that > Remove classpath conflicts of netty-all jar in hadoop-hdfs-client > - > > Key: HDFS-10570 > URL: https://issues.apache.org/jira/browse/HDFS-10570 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Attachments: HDFS-10570-01.patch, HDFS-10570-02.patch, > HDFS-10570-branch-2.8-02.patch > > > While debugging tests in eclipse, Cannot access DN http url. > Also WebHdfs tests cannot run in eclipse due to classes loading from old > version of netty jars instead of netty-all jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org