[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-06 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266694#comment-14266694
 ] 

Kihwal Lee commented on HDFS-7587:
--

This is a side-effect of HDFS-6423. [~daryn] has suggested that the quota check 
be done before converting inode/block. If something goes wrong, undoing the 
quota update is easier.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Blocker
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267227#comment-14267227
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

> ... . Daryn Sharp has suggested that the quota check be done before 
> converting inode/block. ...

Sounds good.  All the checks (quota, permission, etc.) should the performed 
before any change to the namespace.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276269#comment-14276269
 ] 

Hadoop QA commented on HDFS-7587:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692046/HDFS-7587.patch
  against trunk revision 10ac5ab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9200//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9200//console

This message is automatically generated.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277943#comment-14277943
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

{code}
+// MUST attempt quota update before changing in-memory states
+updateQuotaForAppend(iip, file);
...
+// may fail if block token creation fails, but we're still in a
+// consistent state if the edit is logged first
+return blockManager.convertLastBlockToUnderConstruction(file, 0);
{code}
I think we should use FSDirectory.verifyQuota(..) (instead updating quote) in 
the beginning and then update quota at the end.  Otherwise, the quote counts 
will be incorrect if there is an exception thrown later on.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-20 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284272#comment-14284272
 ] 

Xiaoyu Yao commented on HDFS-7587:
--

Agree with [~szetszwo] we should use verifyQuota() instead of 
UpdateSpaceConsumed(). Also, can we add a unit test to verify that correctness 
of the quota usage after the exception is thrown for this case?

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-20 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284357#comment-14284357
 ] 

Daryn Sharp commented on HDFS-7587:
---

{{verifyQuota}} is already invoked so the quota counts shouldn't go out of 
sync.  {{updateSpaceConsumed}} calls {{updateCount}}, which calls 
{{verifyQuota}} prior to invoking {{unprotectedUpdateCount}}.  The quotas 
aren't going to change so it seems calling {{verifyQuota}} explicitly is wasted 
processing time.

bq.  Otherwise, the quote counts will be incorrect if there is an exception 
thrown later on.

Do you have a scenario in mind?  Ie. what is "later on"?  Moving the file to UC 
and associating the lease aren't going to throw checked exceptions.  They might 
throw a runtime exception.  The NN has no concept of a transaction (no 
rollback), so we're fully committed to finishing the op once we start updating 
datastructures.  In this patch, once the quota update is successful, we're 
committed to moving the file to UC and assigning a lease.  If we think those 
final steps will throw, then we're in trouble because we can't rollback.  Even 
if that were to happen, an out of sync quota is better than a corrupted 
in-memory state and edit logs caused by the NN throwing runtime exceptions that 
don't cause an abort.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-01-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284594#comment-14284594
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

> ... The quotas aren't going to change so it seems calling verifyQuota 
> explicitly is wasted processing time.

We may call verifyQuota in the beginning and update quota without checking at 
the end.

> Do you have a scenario in mind? Ie. what is "later on"? Moving the file to UC 
> and associating the lease aren't going to throw checked exceptions. ...

convertLastBlockToUnderConstruction does throw IOException.

> ... Even if that were to happen, an out of sync quota is better than a 
> corrupted in-memory state and edit logs caused by the NN throwing runtime 
> exceptions that don't cause an abort.

Agree.  Out of sync quota is better than a corrupted in-memory state.  Also, 
in-sync quota is better than out of sync quota.  We could have both in-sync 
quota and uncorrupted in-memory state.  I think no one is saying that we prefer 
in-sync quota than uncorrupted in-memory state.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-02-05 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308229#comment-14308229
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

Hi [~daryn], are you still working on this?

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357500#comment-14357500
 ] 

Vinod Kumar Vavilapalli commented on HDFS-7587:
---

[~kihwal] / [~daryn] / [~szetszwo], is this still a blocker for 2.7? Can 
progress be made in the next few days? I plan to cut an RC end of this week. 
Please update. Tx.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358717#comment-14358717
 ] 

Kihwal Lee commented on HDFS-7587:
--

It is still valid. We will get to it today.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-12 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359109#comment-14359109
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

I am fine if we commit the current patch and then file a follow up JIRA for 
addressing [this 
comment|https://issues.apache.org/jira/browse/HDFS-7587?focusedCommentId=14277943&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14277943].

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359128#comment-14359128
 ] 

Kihwal Lee commented on HDFS-7587:
--

We have been running with this version of patch at scale for some time now. We 
will address any remaining concern in a separate jira.
+1.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359300#comment-14359300
 ] 

Kihwal Lee commented on HDFS-7587:
--

The patch does not apply anymore.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-12 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359318#comment-14359318
 ] 

Daryn Sharp commented on HDFS-7587:
---

The patch doesn't apply because the logic is very different due to truncate and 
variable length blocks.

At first glance, the new code looks buggy.  It's sometimes billing quota, 
sometimes not, if the block exceeds the preferred size it appears you "earn" 
back quota.  I don't have the familiarity with all this new code to provide a 
timely patch.  Un-assigning myself.  [~jingzhao], want to take a look?

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366016#comment-14366016
 ] 

Kihwal Lee commented on HDFS-7587:
--

Is it possible for the last block size to be greater than the preferred block 
size?
{code}
+  final long diff = (file.getPreferredBlockSize() - 
lastBlock.getNumBytes())
+  * file.getBlockReplication();
{code}

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366135#comment-14366135
 ] 

Jing Zhao commented on HDFS-7587:
-

Thanks for the comment, Kihwal. Actually I mentioned this in my previous 
comment: with fix from HDFS-7943 we will not have blocks with size greater than 
the preferred block size. But I will appreciate if you can also confirm.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366182#comment-14366182
 ] 

Kihwal Lee commented on HDFS-7587:
--

Sorry, I missed the comment. Should have refreshed.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366301#comment-14366301
 ] 

Hadoop QA commented on HDFS-7587:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705151/HDFS-7587.001.patch
  against trunk revision 32b4330.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9942//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9942//console

This message is automatically generated.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366383#comment-14366383
 ] 

Hadoop QA commented on HDFS-7587:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705176/HDFS-7587.002.patch
  against trunk revision 968425e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9946//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9946//console

This message is automatically generated.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366499#comment-14366499
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

- In verifyQuotaForUCBlock,
-* let's add a precondition check to make sure delta >= 0 (for append).
-* It should check if (!fsd.getFSNamesystem().isImageLoaded() || 
fsd.shouldSkipQuotaChecks()); see FSDirConcatOp.verifyQuota and 
FSDirRenameOp.verifyQuotaForRename.

- We should also verity quota by storage types; see 
FSDirConcatOp.computeQuotaDeltas.

- For truncate, the quota should be updated differently as following:
-* Copy-on-truncate for snapshot: need quota check for creating the new block.  
Quota usage count is increased.
-* Non-copy-on-truncate OR Copy-on-truncate for upgrade but not snapshot: Quota 
usage count is decreased.  No quota check is needed.

Do you want to fix truncate in a different JIRA?


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368320#comment-14368320
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

+1 the new patch looks good.

> We may also need to check/update the quota here since the current logic is to 
> count UC block's storage usage using the preferred size.

After truncate, there are two types for UC blocks, for creating/appending a 
block and for truncating a block.  These quota counting for these two cases 
should be different.  Let's discuss it in the truncate JIRA.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368342#comment-14368342
 ] 

Hadoop QA commented on HDFS-7587:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705452/HDFS-7587.003.patch
  against trunk revision c239b6d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9966//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9966//console

This message is automatically generated.

> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368351#comment-14368351
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7368 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7368/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368914#comment-14368914
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #137 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/137/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368936#comment-14368936
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #871 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/871/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369405#comment-14369405
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2069 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2069/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369426#comment-14369426
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #128 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/128/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369493#comment-14369493
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #137 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/137/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369531#comment-14369531
 ] 

Hudson commented on HDFS-7587:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2087 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2087/])
HDFS-7587. Edit log corruption can happen if append fails with a quota 
violation. Contributed by Jing Zhao. (jing9: rev 
c7c71cdba50cb7d8282622cd496cc913c80cff54)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java


> Edit log corruption can happen if append fails with a quota violation
> -
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7587.001.patch, HDFS-7587.002.patch, 
> HDFS-7587.003.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)