[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-9275: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to trunk. > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Walter Su >Assignee: Walter Su > Fix For: 3.0.0 > > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Attachment: HDFS-9275.05.patch Thanks, [~hitliuyi]. bq. Also check this in scheduleRecovery to avoid unnecessary choose targets. Good idea. I moved it to {{scheduleRecovery}}. bq. move the block group to end of queue of same priority in neededReplications, otherwise it's chosen first again next time. Don't have to. {{UnderReplicatedBlocks}} has a inside bookmark. Uploaded 05 patch. > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Affects Version/s: 3.0.0 > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Summary: Wait previous ErasureCodingWork to finish before schedule another one (was: Fix TestRecoverStripedFile) > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Description: In {{ErasureCodingWorker}}, for the same block group, one task doesn't know which internal blocks is in recovering by other tasks. We could end up with recovering 2 identical block with same index. So, {{ReplicationMonitor}} should wait previous work to finish before schedule another one. This is related to the occasional failure of {{TestRecoverStripedFile}}. > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Attachment: HDFS-9275.04.patch for non-ec block, if it loses 1 replica, we schedule 1 task. It loses 1 replica again, we schedule 2nd task. Two tasks start at diffenrent time. Two tasks can run simultaneously. Two tasks are *independent*, because they recover the same block content. Currently for ec block group, if it loses 1 (internal) block, we schedule 1 task. It loses 1 block again, we schedule 2nd task. Two tasks run simultaneously. Two tasks are *related*. They should recover different blocks with different index. In fact the two tasks recover the same block with the same index. That's the problem. bq. I think we can do a simple improvement for striped block, if there is one in PendingReplicationBlocks, then we don't schedule new reconstruction work instead of comparing the number of missed striped internal blocks. Yes. We can wait 1 task to finish, then we start the 2nd task ( for the same block group). Uploaded 04 patch. Simple fix. Include some tests clean up from 02 patch. > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Attachment: (was: HDFS-9275.04.patch) > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Attachment: HDFS-9275.04.patch > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Attachment: (was: HDFS-9275.04.patch) > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Attachment: HDFS-9275.04.patch > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one
[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9275: Status: Patch Available (was: Open) > Wait previous ErasureCodingWork to finish before schedule another one > - > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, > HDFS-9275.03.patch, HDFS-9275.04.patch > > > In {{ErasureCodingWorker}}, for the same block group, one task doesn't know > which internal blocks is in recovering by other tasks. We could end up with > recovering 2 identical block with same index. So, {{ReplicationMonitor}} > should wait previous work to finish before schedule another one. > This is related to the occasional failure of {{TestRecoverStripedFile}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)