[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047893#comment-14047893
 ] 

Jing Zhao commented on HDFS-6527:
-

Copied from release vote mailing thread:

{code}
That's fine by me. Like I said, assuming that rc1 does indeed include the fix 
in HDFS-6527, and not the revert, then rc1 should be functionally correct. 
What's in branch-2.4.1 doesn't currently match what's in this RC, but if that 
doesn't bother anyone else then I won't lose any sleep over it.
--
Aaron T. Myers
Software Engineer, Cloudera

 On Jun 27, 2014, at 3:04 PM, Arun C. Murthy a...@hortonworks.com wrote:

 Aaron,

 Since the amend was just to the test, I'll keep this RC as-is.

 I'll also comment on jira.

 thanks,
 Arun
{code}

In this way, I plan not to change branch-2.4.1 and leave it as it is. What do 
you think [~atm] and [~acmurthy]?

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-30 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047895#comment-14047895
 ] 

Aaron T. Myers commented on HDFS-6527:
--

I'd say we should still probably do it, but put it only in branch-2.4, not 
branch-2.4.1, in case we end up later doing a 2.4.2 release.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047907#comment-14047907
 ] 

Jing Zhao commented on HDFS-6527:
-

Thanks for the quick reply, [~atm]. I will merge it to branch-2.4 then.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048258#comment-14048258
 ] 

Jing Zhao commented on HDFS-6527:
-

Actually the patch has already been merged into 2.4. I will leave it there as 
it is now.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046173#comment-14046173
 ] 

Jing Zhao commented on HDFS-6527:
-

Thanks [~atm]! The analysis makes sense to me. Let's reopen the jira and fix it 
in 2.4.1.

In the meanwhile, in FSNamesystem#delete, maybe we should move 
dir.removeFromInodeMap(removedINodes) into the fsnamesystem write lock? I 
guess this will prevent similar issue in the future.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046194#comment-14046194
 ] 

Aaron T. Myers commented on HDFS-6527:
--

bq. In the meanwhile, in FSNamesystem#delete, maybe we should move 
dir.removeFromInodeMap(removedINodes) into the fsnamesystem write lock? I 
guess this will prevent similar issue in the future.

This currently is done under the write lock, but it's done after having dropped 
the write lock briefly, so presumably you're proposing to make all of the 
contents of {{deleteInternal}} happen under a single write lock? That seems 
reasonable to me, but to be honest I'm not sure of the history of why this 
deferred inode removal was done in the first place.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046198#comment-14046198
 ] 

Jing Zhao commented on HDFS-6527:
-

Yeah, let's merge the fix back to 2.4.1 with unit test changes first. We can 
revisit the deferred removal in a separate jira.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046441#comment-14046441
 ] 

Jing Zhao commented on HDFS-6527:
-

[~atm], do you already have a updated patch for 2.4.1? Otherwise I will try it.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046521#comment-14046521
 ] 

Aaron T. Myers commented on HDFS-6527:
--

Hi Jing, I haven't currently create that patch, no. Please do feel free to go 
ahead and do that. Thanks a lot.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527-addendum-test.patch, 
 HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, 
 HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043586#comment-14043586
 ] 

Kihwal Lee commented on HDFS-6527:
--

[~jingzhao] You are right. Since it reresolves inside the write lock, it will 
detect the deletion.  I will revert it from 2.4.1.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.4.1

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043596#comment-14043596
 ] 

Kihwal Lee commented on HDFS-6527:
--

Reverted it from branch-2.4.1 and also updated the release note.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042720#comment-14042720
 ] 

Jing Zhao commented on HDFS-6527:
-

Thanks for the report, [~l201514]! Actually the unit test failure can be 
reproduced in branch-2.4.1.

I think the patch should not be included in branch-2.4.1. In branch-2.4.1, we 
have the following code in the beginning of analyzeFileState:
{code}
final INodesInPath iip = dir.getINodesInPath4Write(src);
final INodeFile pendingFile
= checkLease(src, fileId, clientName, iip.getLastINode());
{code}
I.e., the inode is retrieved by resolving the path. Since we remove the inode 
from the children list while holding the fsnamesystem lock, the race condition 
can actually be avoided I guess.

HDFS-6294 changed this part to load the INode from the inode map, while in 
FSNamesystem#deleteInternal, the inode is removed from the inode map without 
holding fsnamesystem lock. I guess this is the cause of the issue, and the 
issue is only in 2.5 and trunk.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.4.1

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-23 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041456#comment-14041456
 ] 

Siqi Li commented on HDFS-6527:
---

When running unit tests in this patch v5, I get the following errors

2014-06-23 13:36:09,516 ERROR hdfs.DFSClient 
(DFSClient.java:closeAllFilesBeingWritten(873)) - Failed to close file 
/testDeleteAddBlockRace
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on /testDeleteAddBlockRace: File does not exist. Holder 
DFSClient_NONMAPREDUCE_1652233532_1 does not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2941)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2762)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:585)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1443)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1265)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:529)
 

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.4.1

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-21 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039716#comment-14039716
 ] 

Arun C Murthy commented on HDFS-6527:
-

Changed fix version to 2.4.1 since it got merged in for rc1.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.4.1

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035586#comment-14035586
 ] 

Hudson commented on HDFS-6527:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #587 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/587/])
HDFS-6527. Edit log corruption due to defered INode removal. Contributed by 
Kihwal Lee and Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603340)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java


 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035704#comment-14035704
 ] 

Hudson commented on HDFS-6527:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1778 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1778/])
HDFS-6527. Edit log corruption due to defered INode removal. Contributed by 
Kihwal Lee and Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603340)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java


 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035813#comment-14035813
 ] 

Hudson commented on HDFS-6527:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1805 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1805/])
HDFS-6527. Edit log corruption due to defered INode removal. Contributed by 
Kihwal Lee and Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603340)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java


 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033814#comment-14033814
 ] 

Kihwal Lee commented on HDFS-6527:
--

+1 for the test improvement.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034097#comment-14034097
 ] 

Jing Zhao commented on HDFS-6527:
-

Thanks for reviewing the test change, [~kihwal]! I will commit this late today 
if no further comments.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034689#comment-14034689
 ] 

Hudson commented on HDFS-6527:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5720 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5720/])
HDFS-6527. Edit log corruption due to defered INode removal. Contributed by 
Kihwal Lee and Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603340)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java


 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032747#comment-14032747
 ] 

Hadoop QA commented on HDFS-6527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650582/HDFS-6527.v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7133//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//console

This message is automatically generated.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033322#comment-14033322
 ] 

Hadoop QA commented on HDFS-6527:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650667/HDFS-6527.v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7141//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//console

This message is automatically generated.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-15 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032063#comment-14032063
 ] 

Jing Zhao commented on HDFS-6527:
-

The v3 may not work when the file is contained in a snapshot. The new unit test 
can fail if we create a snapshot on root after the file creation:
{code}
  FSDataOutputStream out = null;
  out = fs.create(filePath);
  SnapshotTestHelper.createSnapshot(fs, new Path(/), s1);
  Thread deleteThread = new DeleteThread(fs, filePath, true);
{code}

Instead of the changes made in v3 patch, I guess the v2 patch may work with the 
following change:
{code}
@@ -3018,6 +3036,13 @@ private INodeFile checkLease(String src, String holder, 
INode inode,
   + (lease != null ? lease.toString()
   : Holder  + holder +  does not have any open files.));
 }
+// If parent is not there or we mark the file as deleted in its snapshot
+// feature, it must have been deleted.
+if (file.getParent() == null
+|| (file.isWithSnapshot()  file.getFileWithSnapshotFeature()
+.isCurrentFileDeleted())) {
+  throw new FileNotFoundException(src);
+}
 String clientName = file.getFileUnderConstructionFeature().getClientName();
 if (holder != null  !clientName.equals(holder)) {
   throw new LeaseExpiredException(Lease mismatch on  + ident +
{code}

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-15 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032066#comment-14032066
 ] 

Jing Zhao commented on HDFS-6527:
-

Besides, is it possible that we can hide the sleepForTesting part inside a 
customized BlockPlacementPolicy? The customized BlockPlacementPolicy 
implementation uses the same policy as BlockPlacementPolicyDefault, but make 
the thread sleep 1s before returning. In this way we do not need to inject 
testing code into FSNamesystem.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030657#comment-14030657
 ] 

Kihwal Lee commented on HDFS-6527:
--

This bug has been there since 2.1.0-beta. It involves two client threads 
creating and deleting the same file.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since startFile() acquires FSN read lock and then write lock, a deletion can 
 happen in between. Because of deferred inode removal outside FSN write lock, 
 startFile() can get the deleted inode from the inode map with FSN write lock 
 held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030701#comment-14030701
 ] 

Daryn Sharp commented on HDFS-6527:
---

Is it possible to avoid having fsdir up-call to the fsn, apparently to clear 
leases, but rather fsdir removes the inodes from the map and fsn clears the 
leases?

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030707#comment-14030707
 ] 

Kihwal Lee commented on HDFS-6527:
--

Instead of moving inode removal inside lock, we could have getAdditionalBlock() 
to do additional check after acquiring the FSN write lock.  One possibility is 
to have it acquire FSDirectory read lock and check the parent of the inode. If 
deleted, it should be null.  Or we could make checkLease() do more than just 
checking the clientName recorded in the inode.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030944#comment-14030944
 ] 

Jing Zhao commented on HDFS-6527:
-

Thanks for the fix, [~kihwal]! 

bq. The new patch simply checks the parent of inode against null
Currently if the inode is in a snapshot then its parent will not be set to null 
after deletion. In that case can we run into a scenario where a block is added 
to a deleted file that is in our read-only snapshot? Maybe we also need to 
check FileWithSnapshotFeature#isCurrentFileDeleted?

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031114#comment-14031114
 ] 

Kihwal Lee commented on HDFS-6527:
--

Thanks for the comment, [~jingzhao]. We can null out the client name while 
deleting files. Then lease check is guaranteed to fail.

In {{INodeFile#destroyAndCollectBlocks()}}, we can delete the client name.
{code}
 if (sf != null) {
   sf.clearDiffs();
 }
+
+// Delete client name if under construction. This destroys a half of
+// the lease. The other half will be removed later from LeaseManager.
+FileUnderConstructionFeature uc = getFileUnderConstructionFeature();
+if (uc != null) {
+  uc.setClientName(null);
+}
   }
{code}

And in {{FSNamesystem#checkLease()}}, we can have the following check instead 
of the parent == null check.
{code}
 String clientName = file.getFileUnderConstructionFeature().getClientName();
+if (clientName == null) {
+  // clientName is removed when the file is deleted.
+  throw new FileNotFoundException(src);
+}
{code}

This will make lease checks to fail once the real file is deleted, whether it 
is in a snapshot or not.  Do you think it is reasonable?

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031142#comment-14031142
 ] 

Hadoop QA commented on HDFS-6527:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650347/HDFS-6527.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1260 javac 
compiler warnings (more than the trunk's current 1259 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7115//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7115//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//console

This message is automatically generated.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal

2014-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031346#comment-14031346
 ] 

Hadoop QA commented on HDFS-6527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650386/HDFS-6527.v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7118//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7118//console

This message is automatically generated.

 Edit log corruption due to defered INode removal
 

 Key: HDFS-6527
 URL: https://issues.apache.org/jira/browse/HDFS-6527
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
 HDFS-6527.v2.patch, HDFS-6527.v3.patch


 We have seen a SBN crashing with the following error:
 {panel}
 \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
 Encountered exception on operation AddBlockOp
 [path=/xxx,
 penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
 RpcCallId=-2]
 java.io.FileNotFoundException: File does not exist: /xxx
 {panel}
 This was caused by the deferred removal of deleted inodes from the inode map. 
 Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
 deletion can happen in between. Because of deferred inode removal outside FSN 
 write lock, getAdditionalBlock() can get the deleted inode from the inode map 
 with FSN write lock held. This allow addition of a block to a deleted file.
 As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
 crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)