[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403807#comment-16403807 ] Konstantin Shvachko commented on HDFS-12422: For the record this piece of code was [introduced way back|http://svn.apache.org/viewvc?view=revision=1091515] by HDFS-1606. I think the current code is actually correct. So we are in {{BlockConstructionStage.PIPELINE_CLOSE}} state. Adding nodes when the pipeline is closing doesn't make sense to me, because something went wrong and the client should just salvage whatever is remaining and let NN recover the block. And it seems the client does just that. I see that in {{processDatanodeOrExternalError()}} if {{PIPELINE_CLOSE}} it closes the block. I also see this block replica is complete and good. Besides adding DNs as you propose only makes the case rarer, but doesn't fully solve the case. What if adding DNs fails, then you get the same problem again. So it seems that you should look why NN does not replicate such block. I did not check in current code base, but here is how it should work. # The pipeline failed with only one last replica, so NN will not allow the client to close the file. Write fails. # NN will not replicate the block because it is still under construction. # One hour later the file lease will expire and NN starts lease recovery, which triggers replica recovery. # Once finished NN closes the file, and the block becomes under-replicated. # Replication monitor starts replication. So eventually the block should be recovered, it just takes time > 1 hour. If it doesn't happen then we have a problem. LMK > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401170#comment-16401170 ] Chris Douglas commented on HDFS-12422: -- bq. do you know anybody fit for reviewing this? [~shv], if he has cycles. > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399703#comment-16399703 ] Íñigo Goiri commented on HDFS-12422: Another piece of code that comes from a long time ago. [~chris.douglas], do you know anybody fit for reviewing this? We had this issue a while back from customers not closing their streams. > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399612#comment-16399612 ] genericqa commented on HDFS-12422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 45s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}156m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HDFS-12422 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914569/HDFS-12422.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 24e3bcb7fb6e 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399235#comment-16399235 ] Lukas Majercak commented on HDFS-12422: --- Added patch 002 after rebasing. > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398975#comment-16398975 ] genericqa commented on HDFS-12422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-12422 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12422 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886501/HDFS-12422.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23481/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398950#comment-16398950 ] Íñigo Goiri commented on HDFS-12422: Thanks [~lukmajercak] for the ping. The whole fix is to remove the if that skips the recovery: {code} } else if (stage == BlockConstructionStage.PIPELINE_CLOSE || stage == BlockConstructionStage.PIPELINE_CLOSE_RECOVERY) { //pipeline is closing return; } {code} > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398940#comment-16398940 ] Lukas Majercak commented on HDFS-12422: --- Ping to trigger build. > Replace DataNode in Pipeline when waiting for Last Packet fails > --- > > Key: HDFS-12422 > URL: https://issues.apache.org/jira/browse/HDFS-12422 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: hdfs > Attachments: HDFS-12422.001.patch > > > # Create a file with replicationFactor = 4, minReplicas = 2 > # Fail waiting for the last packet, followed by 2 exceptions when recovering > the leftover pipeline > # The leftover pipeline will only have one DN and NN will never close such > block, resulting in failure to write > The block will stay there forever, unable to be replicated, ultimately going > missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails
[ https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162244#comment-16162244 ] Hadoop QA commented on HDFS-12422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 11s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040 | | | hadoop.hdfs.TestLeaseRecoveryStriped | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 | | | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | | hadoop.hdfs.TestPipelines | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure170 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | | Timed out junit tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | HDFS-12422 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886501/HDFS-12422.001.patch | | Optional Tests | asflicense