[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048215#comment-13048215 ] Todd Lipcon commented on HDFS-988: -- +1 on the trunk patch, once you've run the full test suite through jcarder (with the "lockclasses" branch that detects rwlock issues). Also looks like it needs a rebase > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048213#comment-13048213 ] Todd Lipcon commented on HDFS-988: -- +1 on the 0.22 patch > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046397#comment-13046397 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481892/hdfs-988-6.patch against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/747//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/747//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/747//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046389#comment-13046389 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481892/hdfs-988-6.patch against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/746//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/746//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/746//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046369#comment-13046369 ] Eli Collins commented on HDFS-988: -- Btw test-patch is +1 for hdfs-988-b22-1.patch on branch 22 and all the tests pass. I think it's good to go, mind reviewing it? It's scope is limited to just the issue in the description (vs the latest patch for trunk). > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042615#comment-13042615 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481197/988-fixups.txt against trunk revision 1130381. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/684//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042600#comment-13042600 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481173/hdfs-988-5.patch against trunk revision 1130339. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestEditLogRace org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace org.apache.hadoop.hdfs.server.namenode.TestStartup org.apache.hadoop.hdfs.TestDFSFinalize org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.TestDFSStartupVersions org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.TestDFSUpgrade org.apache.hadoop.hdfs.TestListFilesInDFS org.apache.hadoop.hdfs.TestListFilesInFileContext org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/679//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/679//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/679//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-b22-1.patch, > hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042591#comment-13042591 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481191/hdfs-988-b22-1.patch against trunk revision 1130381. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/683//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-b22-1.patch, > hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042445#comment-13042445 ] Eli Collins commented on HDFS-988: -- ELOS#flush calls ELFOS#flushAndSync which does a force on the underlying file channel. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042432#comment-13042432 ] Bharath Mundlapudi commented on HDFS-988: - I am just wondering, if we are calling os sync at all on this code path. All i see is flush call which flushes from EditLogOutputStream (java buffers) to kernel buffers. Shouldn't we be doing the following? eStream.flush(); eStream.getFileOutputStream().getFD().sync(); This will make sure the edits are actually written to disk. Is there any reason for not doing this? > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042351#comment-13042351 ] Eli Collins commented on HDFS-988: -- It looks like most of the unprotected* methods take the rwlock, but don't need to because either because their caller takes the lock or they are called from loading the edit log (which is why we originally had unprotected versions). Do people mind if I fix that up (remove the locking from these methods, make sure the unprotected versions are only called when loading the log) in this change or do people want that done in a separate change? > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041962#comment-13041962 ] dhruba borthakur commented on HDFS-988: --- we should not logSync after a nextGenerationStamp() because it is invoked for every file creation, isn't it? and before the file-create call returns from the NN, we do invoke logSync(), so we shud be safe. The other time we invoke nextGenerationStamp() is for pipeline error recovery, and that code path should also be ok, let me think. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041934#comment-13041934 ] Todd Lipcon commented on HDFS-988: -- Didn't go through the new tests yet, but here are some comments from a first pass through FSN: - checks for if (auditLog.isInfoEnabled()) should probably now be (auditLog.isInfoEnabled() && isExternalInvocation()) -- otherwise we're doing a needless directory traversal for fsck - The following methods currently do logSync() while holding the writeLock, which is expensive: -- setPermission -- setOwner -- commitBlockSynchronization (in some exit paths) -- updatePipeline - These methods should probably just set a local boolean within the synchronized section, then logSync() in the finally clause if it's flagged - seems strange that some of the xInternal() methods take the write lock themselves (eg setReplicationInternal) whereas others assume the caller takes the write lock (eg createSymlinkInternal). We should be consistent - for those methods that don't explicitly take the write lock, we should either add an {{assert hasWriteLock()}} or a comment explaining why the lock is not necessary (eg internalReleaseLease, reassignLease, finalizeINodeFileUnderConstruction) - why doesn't getListing need the read lock? - comment for endCheckpoint says "not started" but should say "not ended" - same with updatePipeline - I noticed that nextGenerationStamp() doesn't logSync() -- that seems dangerous, since after a restart we might hand out a duplicate genstamp. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035690#comment-13035690 ] Matt Foley commented on HDFS-988: - Out of the seven test failures, the only one that might have to do with this patch is * org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection But I think it's unlikely. In case it wasn't clear, I'm offering this patch file as a possibly useful portion of the solution for this bug, not as a solution in its own right. Feel free to incorporate all or parts of it. Or not. :-) > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035653#comment-13035653 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479629/HDFS-988_fix_synchs.patch against trunk revision 1124364. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/559//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/559//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/559//console This message is automatically generated. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, > hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035233#comment-13035233 ] dhruba borthakur commented on HDFS-988: --- yes, they should all be using RWLock, absolutely. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-988-2.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035201#comment-13035201 ] Todd Lipcon commented on HDFS-988: -- Looking at trunk I see a lot of other strange synchronization issues. The following methods all are synchronized on the FSNamesystem instance: - getNamespaceInfo - setQuota - renewLease - isInSafeMode - isInStartupSafeMode - isPopulatingReplQueues - nextGenerationStamp I think all of these should probably be using the new rwlock... Dhruba, what do you think? Maybe we need something more like a stress/fuzz test against FSNamesystem rather than trying to target the specific cases mentioned above? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-988-2.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029803#comment-13029803 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478366/hdfs-988-2.patch against trunk revision 1100054. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/462//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/462//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/462//console This message is automatically generated. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-988-2.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029783#comment-13029783 ] dhruba borthakur commented on HDFS-988: --- 1. we should not throw an exception is unable to update accesstime because NN is in safemode. This will prevent a adminstrator from running read-only operations (dfs -cat) on a cluster that is just starting up and is still in safemode. It shud just log a warning message and continue 2. Unit test: Create a cluster, open a file for write and and put cluster in safemode. verify that getAdditionalBlock, close, commitBlockSync etc fails when NN is in safemode > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-988-2.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017997#comment-13017997 ] Nigel Daley commented on HDFS-988: -- Todd, any update on this for 0.22? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884403#action_12884403 ] dhruba borthakur commented on HDFS-988: --- It is not a easily repeatable scenario: it is a race condition. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append > > Attachments: hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884380#action_12884380 ] Brahma Reddy Battula commented on HDFS-988: --- hi dhruba borthakur , I am using the 20.1 version.. I put NameNode in safemode and then i executed the save namespace,,But editlogs are not corrupted.. can u please give exact scenario to reproduce this. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append > > Attachments: hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880597#action_12880597 ] dhruba borthakur commented on HDFS-988: --- hi todd, would appreciate it if u can write some sort of a unit test for this one, that will help getting this into trunk. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.20-append > > Attachments: hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876798#action_12876798 ] Todd Lipcon commented on HDFS-988: -- Nicolas: by seqnum you're referring to the generation stamp on the block being written, as the client continues to initiate recovery during shutdown? Do we also need to pull in HDFS-1145 then? (I have this patch applied on my append branch, but not 1145, and not seen errors, just seems to make sense) > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.22.0, 0.20-append >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866081#action_12866081 ] dhruba borthakur commented on HDFS-988: --- Code looks excellent. can we add a unit test that does the following: * Create a cluster, open a file for write and and put in safemode. verify that getAdditionalBlock, close etc fails when NN is in safemode > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-988.txt, saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866070#action_12866070 ] Todd Lipcon commented on HDFS-988: -- dhruba, can you take a look at this blocker please? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-988.txt, saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862838#action_12862838 ] Todd Lipcon commented on HDFS-988: -- The failed test is unrelated (been failing all patch builds lately) Can anyone suggest a unit test for this issue? I can't think of any to add since functionality wasn't changed, we just cleaned up some potential races. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-988.txt, saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861756#action_12861756 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443047/hdfs-988.txt against trunk revision 938791. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/331/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/331/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/331/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/331/console This message is automatically generated. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-988.txt, saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860918#action_12860918 ] Todd Lipcon commented on HDFS-988: -- Hey Dhruba, yep, I am planning on doing this this week so it gets in the release. Also, I agree re commitBlockSynchronization, though I think we should probably add a couple more tests in this area. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur >Assignee: Todd Lipcon > Attachments: saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860870#action_12860870 ] dhruba borthakur commented on HDFS-988: --- Hi todd, is it possible to make this one into the re-release of 0.21? I do not think we should allow FSNamesystem.commitBlockSynchronization() in safe mode. Do you visualize a case when this could cause a problem? I like it safer when I can say that no new transactions can make it to the edits log when the namenode is in safemode. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur >Assignee: Todd Lipcon > Attachments: saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856961#action_12856961 ] Todd Lipcon commented on HDFS-988: -- Once HDFS-909 is committed I'll rebase this on trunk and take care of Konstantin's review comments > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur >Assignee: Todd Lipcon > Attachments: saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837060#action_12837060 ] Konstantin Shvachko commented on HDFS-988: -- # {{enterSafeMode()}} looks good. # {{FSNamesystem.getAdditionalBlock()}} checking {{isInSafeMode()}} should be before calling {{chooseTargets()}}. I would not change {{getAdditionalBlock()}} at all. # I think it is fine to {{FSNamesystem.commitBlockSynchronization()}} in safe mode. In any case it probably deserves a separate issue and discussion. # {{renewLease()}} shouldn't be under FSNamesystem lock? {{leaseManeger}} has its own lock. # Your changes to permission methods incorporate HDFS-133. # Dhruba, are you going to promote changes to trunk and 0.21? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > Attachments: saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836021#action_12836021 ] Todd Lipcon commented on HDFS-988: -- Hi Dhruba, I still think we should fix this in the other issues and then backport to 20. But I'll do a review of this patch here since you've already uploaded it: - in setPermission, the audit logging has moved outside the synchronized block. Thus dir.getFileInfo may actually return incorrect info (or even fail if it races with someone deleting the file) - same goes for setOwner - I think it's OK, but can you verify that the top synchronized block in getAdditionalBlock can never have side effects? I don't know the lease management code well enough - checkLease is guaranteed side-effect free? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > Attachments: saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835488#action_12835488 ] Todd Lipcon commented on HDFS-988: -- HDFS-909's fixes cover two issues. One of the two is in 20, and other doesn't appear to be. But we may as well just have one JIRA number. The patch to 20 will just be a subset of the changes from trunk - no sense in diverging fix implementations. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835486#action_12835486 ] dhruba borthakur commented on HDFS-988: --- I agree that HDFS-956 could be fixed. But HDFS-909 does not apply to 0.20, where this one does, isn't it? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835472#action_12835472 ] Todd Lipcon commented on HDFS-988: -- Does this need to be a separate JIRA from HDFS-909? This code is covered by that patch. Also I think HDFS-956 needs to be fixed for this to truly catch all cases. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835406#action_12835406 ] dhruba borthakur commented on HDFS-988: --- I think the wait time is theoretically unbounded, but waiting for 5 minutes seem like a pretty fool-proof idea to me. > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835402#action_12835402 ] Brian Bockelman commented on HDFS-988: -- How long can logSyncs be pending for? Is this corruption still possible if the sysadmin waits, say, 5 minutes? > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835403#action_12835403 ] dhruba borthakur commented on HDFS-988: --- My proposal is to make the enterSafeMode method wait for all pending transactions to get flushed. {code} synchronized void FSNamesystem.enterSafeMode() throws IOException { if (!isInSafeMode()) { safeMode = new SafeModeInfo(); return; } safeMode.setManual(); getEditLog().logSyncAll(); <=== new code here NameNode.stateChangeLog.info("STATE* Safe mode is ON. " + safeMode.getTurnOffTip()); } synchronized void FSEditLog.logSyncAll() throws IOException { TransactionId id = myTransactionId.get(); id.txid = txid; logSync(); } {code} > saveNamespace can corrupt edits log > --- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.