[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-07-31 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7587:
--
Attachment: HDFS-7587-branch-2.6.patch

For the 2.6.1 release effort, the backport isn't straightforward due to 
difference between 2.6 and 2.7. It has the following differences compared to 
the original patch.

* Include part of HDFS-7509 so that prepareFileForWrite has the expected 
function signature.
* Use Quota.Counts instead of QuotaCounts which is introduced in HDFS-7584.
* Skip the check for storage type specific quota introduced in HDFS-7584.
* Add the necessary definitions for INodesPath#length and 
FSDirectory#shouldSkipQuotaChecks.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: HDFS-7587-branch-2.6.patch, HDFS-7587.001.patch, 
> HDFS-7587.002.patch, HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-09-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7587:
--
Fix Version/s: 2.6.1

[~sjlee0] backported this to 2.6.1. I just pushed the commit to 2.6.1 after 
running compilation and TestDiskspaceQuotaUpdate which changed in the patch.

[~mingma], I didn't actually see a diff between the branch-2 patch and yours / 
Sangjin's. Appreciate any cross-verification on the 2.6.1 branch whether I got 
it right or not. Thanks.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7587-branch-2.6.patch, HDFS-7587.001.patch, 
> HDFS-7587.002.patch, HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-06 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7587:
-
Assignee: Daryn Sharp

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-06 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7587:
--
Component/s: namenode

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7587:
-
Attachment: HDFS-7587.patch

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-13 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7587:
-
Status: Patch Available  (was: Open)

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-12 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7587:
--
Assignee: (was: Daryn Sharp)

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7587:

Attachment: HDFS-7587.001.patch

Rebase Daryn's patch. Also make changes based on Nicholas's comments, i.e., 
first verifying the quota and updating the quota after the action.

With fix from HDFS-7943 we will not have blocks with size greater than the 
preferred block size. Thus we can avoid "earning back" quota scenarios.

Truncate may have similar issue when the data to truncate is only part of the 
original last block. Will update the patch later to fix this part.



> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7587:

Attachment: HDFS-7587.002.patch

Add fix for truncate.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7587:

Attachment: HDFS-7587.003.patch

Thanks for the review, Nicholas! Update the patch to address your comments. I 
will separate the truncate fix into another jira.

bq. Non-copy-on-truncate OR Copy-on-truncate for upgrade but not snapshot: 
Quota usage count is decreased. No quota check is needed.

We may also need to check/update the quota here since the current logic is to 
count UC block's storage usage using the preferred size.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7587:

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks again for the review, Nicholas. I've committed this to 2.7.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)