[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403807#comment-16403807
 ] 

Konstantin Shvachko commented on HDFS-12422:


For the record this piece of code was [introduced way 
back|http://svn.apache.org/viewvc?view=revision=1091515] by HDFS-1606.

I think the current code is actually correct. So we are in 
{{BlockConstructionStage.PIPELINE_CLOSE}} state. Adding nodes when the pipeline 
is closing doesn't make sense to me, because something went wrong and the 
client should just salvage whatever is remaining and let NN recover the block. 
And it seems the client does just that. I see that in 
{{processDatanodeOrExternalError()}} if {{PIPELINE_CLOSE}} it closes the block. 
I also see this block replica is complete and good.
 Besides adding DNs as you propose only makes the case rarer, but doesn't fully 
solve the case. What if adding DNs fails, then you get the same problem again.
 So it seems that you should look why NN does not replicate such block. I did 
not check in current code base, but here is how it should work.
 # The pipeline failed with only one last replica, so NN will not allow the 
client to close the file. Write fails.
 # NN will not replicate the block because it is still under construction.
 # One hour later the file lease will expire and NN starts lease recovery, 
which triggers replica recovery.
 # Once finished NN closes the file, and the block becomes under-replicated.
 # Replication monitor starts replication.

So eventually the block should be recovered, it just takes time > 1 hour. If it 
doesn't happen then we have a problem. LMK

> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-15 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401170#comment-16401170
 ] 

Chris Douglas commented on HDFS-12422:
--

bq. do you know anybody fit for reviewing this?
[~shv], if he has cycles.

> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399703#comment-16399703
 ] 

Íñigo Goiri commented on HDFS-12422:


Another piece of code that comes from a long time ago.
[~chris.douglas], do you know anybody fit for reviewing this?
We had this issue a while back from customers not closing their streams.

> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399612#comment-16399612
 ] 

genericqa commented on HDFS-12422:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 52s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 45s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HDFS-12422 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914569/HDFS-12422.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 24e3bcb7fb6e 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-14 Thread Lukas Majercak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399235#comment-16399235
 ] 

Lukas Majercak commented on HDFS-12422:
---

Added patch 002 after rebasing.

> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch, HDFS-12422.002.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398975#comment-16398975
 ] 

genericqa commented on HDFS-12422:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-12422 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12422 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886501/HDFS-12422.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23481/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398950#comment-16398950
 ] 

Íñigo Goiri commented on HDFS-12422:


Thanks [~lukmajercak] for the ping.

The whole fix is to remove the if that skips the recovery:
{code}
} else if (stage == BlockConstructionStage.PIPELINE_CLOSE
|| stage == BlockConstructionStage.PIPELINE_CLOSE_RECOVERY) {   
  //pipeline is closing 
  return;
}
{code}

> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2018-03-14 Thread Lukas Majercak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398940#comment-16398940
 ] 

Lukas Majercak commented on HDFS-12422:
---

Ping to trigger build.

> Replace DataNode in Pipeline when waiting for Last Packet fails
> ---
>
> Key: HDFS-12422
> URL: https://issues.apache.org/jira/browse/HDFS-12422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: hdfs
> Attachments: HDFS-12422.001.patch
>
>
> # Create a file with replicationFactor = 4, minReplicas = 2
> # Fail waiting for the last packet, followed by 2 exceptions when recovering 
> the leftover pipeline
> # The leftover pipeline will only have one DN and NN will never close such 
> block, resulting in failure to write
> The block will stay there forever, unable to be replicated, ultimately going 
> missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12422) Replace DataNode in Pipeline when waiting for Last Packet fails

2017-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162244#comment-16162244
 ] 

Hadoop QA commented on HDFS-12422:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
11s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040 |
|   | hadoop.hdfs.TestLeaseRecoveryStriped |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 |
|   | hadoop.hdfs.TestClientProtocolForPipelineRecovery |
|   | hadoop.hdfs.TestPipelines |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure170 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 |
| Timed out junit tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:71bbb86 |
| JIRA Issue | HDFS-12422 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886501/HDFS-12422.001.patch |
| Optional Tests |  asflicense