[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-11-02 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-9275:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Walter Su
>Assignee: Walter Su
> Fix For: 3.0.0
>
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-29 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Attachment: HDFS-9275.05.patch

Thanks, [~hitliuyi].
bq. Also check this in scheduleRecovery to avoid unnecessary choose targets.
Good idea. I moved it to {{scheduleRecovery}}.
bq. move the block group to end of queue of same priority in 
neededReplications, otherwise it's chosen first again next time.
Don't have to. {{UnderReplicatedBlocks}} has a inside bookmark.

Uploaded 05 patch.

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-29 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Affects Version/s: 3.0.0

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Summary: Wait previous ErasureCodingWork to finish before schedule another 
one  (was: Fix TestRecoverStripedFile)

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Description: 
In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
which internal blocks is in recovering by other tasks. We could end up with 
recovering 2 identical block with same index. So, {{ReplicationMonitor}} should 
wait previous work to finish before schedule another one.
This is related to the occasional failure of {{TestRecoverStripedFile}}.

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Attachment: HDFS-9275.04.patch

for non-ec block, if it loses 1 replica, we schedule 1 task. It loses 1 replica 
again, we schedule 2nd task. Two tasks start at diffenrent time. Two tasks can 
run simultaneously. Two tasks are *independent*, because they recover the same 
block content.

Currently for ec block group, if it loses 1 (internal) block, we schedule 1 
task. It loses 1 block again, we schedule 2nd task. Two tasks run 
simultaneously. Two tasks are *related*. They should recover different blocks 
with different index. In fact the two tasks recover the same block with the 
same index. That's the problem.

bq. I think we can do a simple improvement for striped block, if there is one 
in PendingReplicationBlocks, then we don't schedule new reconstruction work 
instead of comparing the number of missed striped internal blocks.
Yes. We can wait 1 task to finish, then we start the 2nd task ( for the same 
block group).

Uploaded 04 patch. Simple fix. Include some tests clean up from 02 patch.

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Attachment: (was: HDFS-9275.04.patch)

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Attachment: HDFS-9275.04.patch

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Attachment: (was: HDFS-9275.04.patch)

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Attachment: HDFS-9275.04.patch

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to finish before schedule another one

2015-10-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9275:

Status: Patch Available  (was: Open)

> Wait previous ErasureCodingWork to finish before schedule another one
> -
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)