[jira] [Created] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
Tsz Wo (Nicholas), SZE created HDFS-5889: Summary: When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5709: - Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Andrew. Thanks also to Jing and Suresh for the reviews and discussion. > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Fix For: 2.4.0 > > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891872#comment-13891872 ] Aaron T. Myers commented on HDFS-5709: -- The javadoc issue is unrelated and is tracked by HADOOP-10325. The TestAuditLogs failure is spurious and is tracked by HDFS-5882. The TestDFSUpgradeFromImage failure is because we need to include the new binary file in order for that to pass. Given that, I'm going to commit this momentarily. > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891867#comment-13891867 ] Hudson commented on HDFS-5709: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5109 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5109/]) HDFS-5709. Improve NameNode upgrade with existing reserved paths and path components. Contributed by Andrew Wang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1564645) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/xdoc/HdfsSnapshots.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeOptionParsing.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-2-reserved.tgz > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891854#comment-13891854 ] Vinay commented on HDFS-5869: - startCheckpoint() will be used only with BackupNode. As Jing pointed out, we should disable the checkpointing from StandbyCheckpointer in StandbyNN when RollingUpgrade in progress. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch, > h5869_20140205.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5869: - Attachment: h5869_20140205.patch h5869_20140205.patch: updates the test and removes checkRollingUpgrade in startCheckpoint. On a second thought, we should only disallow save namespace but allow checkpoint. Otherwise, the edit log may become huge. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch, > h5869_20140205.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891839#comment-13891839 ] Tsz Wo (Nicholas), SZE commented on HDFS-5869: -- When totalEdits > 1, the first two transactions must be OP_START_LOG_SEGMENT and OP_UPGRADE_MARKER so that it won't save namespace. These two transactions won't be lost. Let me check the test to check it. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891837#comment-13891837 ] Vinay commented on HDFS-5869: - Got it. saveNamespace() while loading OP_UPGRADE_MARKER will not include the current editlog segment because {{lastAppliedTxId}} of {{FSImage}} still will be pointing to previous segment/checkpoint's last txn. Also {{totalEdits}} will be 1 so it wont try to save again. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891824#comment-13891824 ] Vinay commented on HDFS-5869: - One more thing.. {code}} else if (rollingUpgradeOpt == RollingUpgradeStartupOption.STARTED) { if (totalEdits > 1) { // save namespace if this is not the second edit transaction // (the first must be OP_START_LOG_SEGMENT) fsNamesys.getFSImage().saveNamespace(fsNamesys); }{code} When the standbyNN is restarted twice with RollingUpgradeStartupOption.STARTED option, we will loose the OP_UPGRADE_MARKER and hence rollingUpgradeInfo also will be lost. Am I missing something here? > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891821#comment-13891821 ] Vinay commented on HDFS-5869: - Oops. I had forgot that. Thanks for the update. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891816#comment-13891816 ] Tsz Wo (Nicholas), SZE commented on HDFS-5869: -- Hi Vinay, Thanks for looking at the patch. FSImage.saveNamespace(..) already has endCurrentLogSegment, startLogSegmentAndWriteHeaderTxn and writeTransactionIdFileToStorage which are the same things as in rolling edit log. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891812#comment-13891812 ] Hadoop QA commented on HDFS-5709: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627049/hdfs-5709-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestAuditLogs org.apache.hadoop.hdfs.TestDFSUpgradeFromImage {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6034//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6034//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6034//console This message is automatically generated. > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891808#comment-13891808 ] Vinay commented on HDFS-5869: - Patch looks good nicholas. I think it will be better to roll edits after saveNamespace() during {{startRollingUpgrade()}}. It will have clear separation of edits also. > When rolling upgrade is in progress, NN should only create checkpoint right > before the upgrade marker > - > > Key: HDFS-5869 > URL: https://issues.apache.org/jira/browse/HDFS-5869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h5869_20140204.patch, h5869_20140204b.patch > > > - When starting rolling upgrade, NN should create a checkpoint before it > writes the upgrade marker edit log transaction. > - When rolling upgrade is in progress, NN should reject saveNamespace rpc > calls. Further, if NN restarts, it should create a checkpoint only right > before the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891700#comment-13891700 ] Hudson commented on HDFS-5399: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5106 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5106/]) Correct CHANGES.txt entry for HDFS-5399 (contributed by Jing, not Haohui) (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1564632) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS-5399. Revisit SafeModeException and corresponding retry policies. Contributed by Haohui Mai. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1564629) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.3.0 > > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.
[ https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891695#comment-13891695 ] Hadoop QA commented on HDFS-5881: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627066/HDFS-5881.branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6035//console This message is automatically generated. > Fix skip() of the short-circuit local reader in 0.23. > - > > Key: HDFS-5881 > URL: https://issues.apache.org/jira/browse/HDFS-5881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.10 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5881.branch-0.23.patch > > > It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a > part of HDFS-2834, which is an API change JIRA. This bug causes to skip more > (as many as the new offsetFromChunkBoundary) data in certain cases. > It is only for branch-0.23. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891692#comment-13891692 ] Todd Lipcon commented on HDFS-5399: --- Oops. I just committed and realized I accidentally credited Haohui instead of you, Jing -- been looking at his PB patch all day :) Sorry about that, I'll correct the CHANGES.txt entry right away. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.3.0 > > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-5399: -- Resolution: Fixed Fix Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.3.0 > > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.
[ https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5881: - Status: Patch Available (was: Open) > Fix skip() of the short-circuit local reader in 0.23. > - > > Key: HDFS-5881 > URL: https://issues.apache.org/jira/browse/HDFS-5881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.10 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5881.branch-0.23.patch > > > It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a > part of HDFS-2834, which is an API change JIRA. This bug causes to skip more > (as many as the new offsetFromChunkBoundary) data in certain cases. > It is only for branch-0.23. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891688#comment-13891688 ] Todd Lipcon commented on HDFS-5399: --- The javadoc warnings are currently showing up on all builds (HADOOP-10325 should address this). I'll commit this to trunk, branch-2, and branch-2.3 > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891685#comment-13891685 ] Hadoop QA commented on HDFS-5399: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627046/HDFS-5399.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6033//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6033//console This message is automatically generated. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.
[ https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5881: - Attachment: HDFS-5881.branch-0.23.patch The patch includes a test case that reproduces the returning of incorrect data, which is fixed similarly to branch-2/trunk. It additionally fixes the skip() return value bug. The patch only applies to 0.23. > Fix skip() of the short-circuit local reader in 0.23. > - > > Key: HDFS-5881 > URL: https://issues.apache.org/jira/browse/HDFS-5881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.10 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5881.branch-0.23.patch > > > It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a > part of HDFS-2834, which is an API change JIRA. This bug causes to skip more > (as many as the new offsetFromChunkBoundary) data in certain cases. > It is only for branch-0.23. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable
[ https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891668#comment-13891668 ] Hadoop QA commented on HDFS-5873: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627034/HDFS-5873.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6031//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6031//console This message is automatically generated. > dfs.http.policy should have higher precedence over dfs.https.enable > --- > > Key: HDFS-5873 > URL: https://issues.apache.org/jira/browse/HDFS-5873 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch, > HDFS-5873.002.patch > > > If dfs.policy.http is defined in hdfs-site.xml, It should have higher > precedence. > In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is > set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead > 'HTTP_AND_HTTPS'. > Currently with this configuration, it activates HTTP_AND_HTTPS policy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5882) TestAuditLogs is flaky
[ https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891662#comment-13891662 ] Hadoop QA commented on HDFS-5882: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627037/hdfs-5882.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6032//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6032//console This message is automatically generated. > TestAuditLogs is flaky > -- > > Key: HDFS-5882 > URL: https://issues.apache.org/jira/browse/HDFS-5882 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: hdfs-5882.patch > > > TestAuditLogs fails sometimes: > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) > Time elapsed: 2.085 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertNotNull(Assert.java:526) > at org.junit.Assert.assertNotNull(Assert.java:537) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891661#comment-13891661 ] Aaron T. Myers commented on HDFS-5709: -- The latest patch looks good to me. +1 pending Jenkins. > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.
[ https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891652#comment-13891652 ] Kihwal Lee commented on HDFS-5881: -- This bug is even more lovelier than I originally thought. skip() has another bug of returning wrong value. In this case, DFSInputStream regards the skip failed and creates a new BlockReaderLocal for subsequent reads. So the effect of original skip bug was sometimes hidden and incurred unnecessary overhead. This "bug-masking bug" is not effective when the remaining data in the internal 32KB buffer is none. I.e. the return value from skip() is correct and the same BlockReaderLocal instance is reused. So, after a chunk-aligned 32KB read and a skip/seek, followed by a read will hit the original bug, which returns wrong data. The fix will make random reads faster and return correct data. > Fix skip() of the short-circuit local reader in 0.23. > - > > Key: HDFS-5881 > URL: https://issues.apache.org/jira/browse/HDFS-5881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.10 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > > It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a > part of HDFS-2834, which is an API change JIRA. This bug causes to skip more > (as many as the new offsetFromChunkBoundary) data in certain cases. > It is only for branch-0.23. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891630#comment-13891630 ] Hadoop QA commented on HDFS-5399: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627004/HDFS-5399.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.io.retry.TestFailoverProxy org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6030//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6030//console This message is automatically generated. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891593#comment-13891593 ] Todd Lipcon commented on HDFS-5399: --- That seems reasonable to me. +1 on the new logic there, pending jenkins. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891587#comment-13891587 ] Andrew Wang commented on HDFS-5709: --- Thanks ATM and Jing for reviewing this! I updated the docs, added command line arg testing for upgrade, and also changed the LV name behavior. Jing, right now I'm using the presence of the k/vs in the map to indicate that the "-renameReserved" flag was passed at all, which is why I didn't statically initialize the map with default values. I could switch it to use a boolean instead, but (with ATM's suggestion) we now have the same default suffix for all reserved paths, so adding a new default is as easy as putting it into the new static array in HdfsConstants. > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5709: -- Summary: Improve NameNode upgrade with existing reserved paths and path components (was: Improve upgrade with existing files and directories named ".snapshot") > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5709: -- Attachment: hdfs-5709-7.patch > Improve NameNode upgrade with existing reserved paths and path components > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5885) Add annotation for repeated fields in the protobuf definition
[ https://issues.apache.org/jira/browse/HDFS-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891573#comment-13891573 ] Todd Lipcon commented on HDFS-5885: --- As far as I know, the packed=true attribute only applies for primitive fields (based on my reading of the docs and of the protobuf code). If we wanted to be extra compact, we could "shred" the BlockProtos into three separate packed lists of primitives, eg: {code} // "Shredded" version of BlockProto, used as a more compact encoding for a list // of blocks. message BlockProtoList { repeated uint64 block_ids = 1 [packed = true]; repeated uint64 gen_stamps = 2 [packed = true]; repeated uint64 sizes = 3 [packed = true]; } {code} The gains here are a couple of bytes per block. Think it's worth it? > Add annotation for repeated fields in the protobuf definition > - > > Key: HDFS-5885 > URL: https://issues.apache.org/jira/browse/HDFS-5885 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5885.000.patch > > > As suggested by the documentation of Protocol Buffers, the protobuf > specification of the fsimage should specify [packed=true] for all repeated > fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5399: Attachment: HDFS-5399.003.patch In the 003 patch I changed "retries >= maxRetries" to "retries - failovers > maxRetries". This can pass TestFailoverProxy in my local test. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > HDFS-5399.003.patch, hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891558#comment-13891558 ] Jing Zhao commented on HDFS-5399: - I just found another issue: we increase the number of retries for both RETRY and FAILOVER_AND_RETRY in RetryInvocationHandler. In that case, if the max-retry-attempts is less than max-failover-attempts, we will fail before we achieve the maximum times of failover. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5880) Fix a typo at the title of HDFS Snapshots document
[ https://issues.apache.org/jira/browse/HDFS-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891552#comment-13891552 ] Akira AJISAKA commented on HDFS-5880: - Thank you, [~andrew.wang]. Closing this issue. > Fix a typo at the title of HDFS Snapshots document > -- > > Key: HDFS-5880 > URL: https://issues.apache.org/jira/browse/HDFS-5880 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, snapshots >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Attachments: HDFS-5880.patch > > > The title of the HDFS Snapshots document is "HFDS Snapshots". > We should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5880) Fix a typo at the title of HDFS Snapshots document
[ https://issues.apache.org/jira/browse/HDFS-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5880: Resolution: Duplicate Assignee: (was: Akira AJISAKA) Status: Resolved (was: Patch Available) > Fix a typo at the title of HDFS Snapshots document > -- > > Key: HDFS-5880 > URL: https://issues.apache.org/jira/browse/HDFS-5880 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, snapshots >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Priority: Minor > Labels: newbie > Attachments: HDFS-5880.patch > > > The title of the HDFS Snapshots document is "HFDS Snapshots". > We should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891547#comment-13891547 ] Colin Patrick McCabe commented on HDFS-5182: bq. Okay now it looks more clear to me now. Thanks for the explanation. Glad to be helpful. bq. My bad. I mixed Linux with SunOS. You can do it using sendmsg() / recvmsg() as you mentioned in the previous comments. I didn't realize that ioctl was the way to do this under SunOS. Interesting. Sending fds via {{sendmsg}} seems to work on all the modern UNIX variants, so I think that we're good there. On Windows, we'll need to use {{DuplicateHandle}}. > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-5399: -- Attachment: hdfs-5399.002.patch The issue was that the convenience constructors for FailoverOnNetworkExceptionRetry didn't maintain the old behavior of retrying multiple times, since they set numRetries to 0. This new patch sets numRetries to Integer.MAX_VALUE for those constructors, and fixes the test. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, > hdfs-5399.002.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891541#comment-13891541 ] Hadoop QA commented on HDFS-4239: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//console This message is automatically generated. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891536#comment-13891536 ] Haohui Mai commented on HDFS-5182: -- Okay now it looks more clear to me now. Thanks for the explanation. bq. By the way, ioctl cannot be used to pass file descriptors in Linux My bad. I mixed Linux with SunOS. You can do it using {{sendmsg()}} / {{recvmsg()}} as you mentioned in the previous comments. > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891535#comment-13891535 ] Todd Lipcon commented on HDFS-5399: --- Hmm, looks like TestFailoverProxy is also failing with the patch. Any ideas? > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5888) Cannot chmod / with new Globber.
Andrew Wang created HDFS-5888: - Summary: Cannot chmod / with new Globber. Key: HDFS-5888 URL: https://issues.apache.org/jira/browse/HDFS-5888 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Due to some changes in the new Globber code, we can no longer chmod "/". We should support this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891534#comment-13891534 ] Hadoop QA commented on HDFS-4239: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//console This message is automatically generated. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5882) TestAuditLogs is flaky
[ https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891531#comment-13891531 ] Jimmy Xiang commented on HDFS-5882: --- I was thinking to force-flush the logger to disk too, but there isn't an easy way. With the current patch, I don't see the problem any more locally. > TestAuditLogs is flaky > -- > > Key: HDFS-5882 > URL: https://issues.apache.org/jira/browse/HDFS-5882 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: hdfs-5882.patch > > > TestAuditLogs fails sometimes: > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) > Time elapsed: 2.085 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertNotNull(Assert.java:526) > at org.junit.Assert.assertNotNull(Assert.java:537) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5882) TestAuditLogs is flaky
[ https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891513#comment-13891513 ] Jing Zhao commented on HDFS-5882: - Can we force the logger to flush here? Looks like the current patch can only decrease the possibility of failure? > TestAuditLogs is flaky > -- > > Key: HDFS-5882 > URL: https://issues.apache.org/jira/browse/HDFS-5882 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: hdfs-5882.patch > > > TestAuditLogs fails sometimes: > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) > Time elapsed: 2.085 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertNotNull(Assert.java:526) > at org.junit.Assert.assertNotNull(Assert.java:537) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891504#comment-13891504 ] Colin Patrick McCabe commented on HDFS-5746: [~sureshms]: we've been having a bunch of problems with the javadoc warning detection code. I filed HADOOP-10325 to fix this properly. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch, > HDFS-5746.006.patch, HDFS-5746.007.patch, HDFS-5746.008.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891500#comment-13891500 ] Colin Patrick McCabe commented on HDFS-5182: bq. I should have said it more concretely. What I'm proposing is that the DN passes the file descriptor to the client (e.g., using ioctl() in Linux). Did you read my first comment? It begins: bq. One way (let's call this choice #1) was using a shared memory segment. This would take the form of a third file descriptor passed from the DataNode to the DFSClient By the way, {{ioctl}} cannot be used to pass file descriptors in Linux. > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5882) TestAuditLogs is flaky
[ https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-5882: -- Attachment: hdfs-5882.patch > TestAuditLogs is flaky > -- > > Key: HDFS-5882 > URL: https://issues.apache.org/jira/browse/HDFS-5882 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: hdfs-5882.patch > > > TestAuditLogs fails sometimes: > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) > Time elapsed: 2.085 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertNotNull(Assert.java:526) > at org.junit.Assert.assertNotNull(Assert.java:537) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5882) TestAuditLogs is flaky
[ https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-5882: -- Status: Patch Available (was: Open) > TestAuditLogs is flaky > -- > > Key: HDFS-5882 > URL: https://issues.apache.org/jira/browse/HDFS-5882 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: hdfs-5882.patch > > > TestAuditLogs fails sometimes: > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) > Time elapsed: 2.085 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertNotNull(Assert.java:526) > at org.junit.Assert.assertNotNull(Assert.java:537) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5872) Validate configuration of dfs.http.policy
[ https://issues.apache.org/jira/browse/HDFS-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-5872. -- Resolution: Duplicate HDFS-5873 includes the fix of this bug thus closing this one as a duplicate. > Validate configuration of dfs.http.policy > - > > Key: HDFS-5872 > URL: https://issues.apache.org/jira/browse/HDFS-5872 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai >Assignee: Haohui Mai > > The current implementation does not complain on invalid values of > dfs.http.policy. The implementation should bail out to alert the user that he > / she has misconfigured the system. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable
[ https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5873: - Attachment: HDFS-5873.002.patch The v2 patch adds a unit test to cover the precedence. > dfs.http.policy should have higher precedence over dfs.https.enable > --- > > Key: HDFS-5873 > URL: https://issues.apache.org/jira/browse/HDFS-5873 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch, > HDFS-5873.002.patch > > > If dfs.policy.http is defined in hdfs-site.xml, It should have higher > precedence. > In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is > set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead > 'HTTP_AND_HTTPS'. > Currently with this configuration, it activates HTTP_AND_HTTPS policy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable
[ https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891467#comment-13891467 ] Jing Zhao commented on HDFS-5873: - The new looks pretty good to me. It will be better to have a unit test to cover DFSUtil#getHttpPolicy. +1 after addressing this. > dfs.http.policy should have higher precedence over dfs.https.enable > --- > > Key: HDFS-5873 > URL: https://issues.apache.org/jira/browse/HDFS-5873 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch > > > If dfs.policy.http is defined in hdfs-site.xml, It should have higher > precedence. > In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is > set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead > 'HTTP_AND_HTTPS'. > Currently with this configuration, it activates HTTP_AND_HTTPS policy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891453#comment-13891453 ] Hadoop QA commented on HDFS-5868: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626979/HDFS-5868-branch-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated -14 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6026//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6026//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6026//console This message is automatically generated. > Make hsync implementation pluggable > --- > > Key: HDFS-5868 > URL: https://issues.apache.org/jira/browse/HDFS-5868 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0 >Reporter: Buddy > Attachments: HDFS-5868-branch-2.patch > > > The current implementation of hsync in BlockReceiver only works if the output > streams are instances of FileOutputStream. Therefore, there is currently no > way for a FSDatasetSpi plugin to implement hsync if it is not using standard > OS files. > One possible solution is to push the implementation of hsync into the > ReplicaOutputStreams class. This class is constructed by the > ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore > it can be extended. Instead of directly calling sync on the output stream, > BlockReceiver would call ReplicaOutputStream.sync. The default > implementation of sync in ReplicaOutputStream would be the same as the > current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5874) Should not compare DataNode current layout version with that of NameNode in DataStrorage
[ https://issues.apache.org/jira/browse/HDFS-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li reassigned HDFS-5874: Assignee: Brandon Li > Should not compare DataNode current layout version with that of NameNode in > DataStrorage > > > Key: HDFS-5874 > URL: https://issues.apache.org/jira/browse/HDFS-5874 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Brandon Li >Assignee: Brandon Li > > As [~vinayrpet] pointed out in HDFS-5754: in DataStorage > DATANODE_LAYOUT_VERSION should not compare with NameNode layout version > anymore. > {noformat} > if (DataNodeLayoutVersion.supports( > LayoutVersion.Feature.FEDERATION, > HdfsConstants.DATANODE_LAYOUT_VERSION) && > HdfsConstants.DATANODE_LAYOUT_VERSION == nsInfo.getLayoutVersion()) > { > readProperties(sd, nsInfo.getLayoutVersion()); > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header
[ https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-5884. - Resolution: Fixed Fix Version/s: HDFS-5698 (FSImage in protobuf) Hadoop Flags: Reviewed +1. I've committed this. > LoadDelegator should use IOUtils.readFully() to read the magic header > - > > Key: HDFS-5884 > URL: https://issues.apache.org/jira/browse/HDFS-5884 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: HDFS-5698 (FSImage in protobuf) > > Attachments: HDFS-5884.000.patch > > > Currently FSImageFormat.LoadDelegator reads the magic header using > {{FileInputStream.read()}}. It does not guarantee that the magic header is > fully read. It should use {{IOUtils.readFully()}} instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5885) Add annotation for repeated fields in the protobuf definition
[ https://issues.apache.org/jira/browse/HDFS-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891398#comment-13891398 ] Jing Zhao commented on HDFS-5885: - According to protobuf document: "For historical reasons, repeated fields of basic numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding." So I guess we only need to also add "[packed=true]" for basic numeric types like int64? Do we want to also add it to "repeated BlockProto blocks"? [~tlipcon], could you please comment on this? > Add annotation for repeated fields in the protobuf definition > - > > Key: HDFS-5885 > URL: https://issues.apache.org/jira/browse/HDFS-5885 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5885.000.patch > > > As suggested by the documentation of Protocol Buffers, the protobuf > specification of the fsimage should specify [packed=true] for all repeated > fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891378#comment-13891378 ] Haohui Mai commented on HDFS-5182: -- I should have said it more concretely. What I'm proposing is that the DN passes the file descriptor to the client (e.g., using {{ioctl()}} in Linux). It seems to me that with this approach (1) the OS takes cares about the resource management, and (2) the client has more flexibility. The client can access the file using the {{read()}} and {{write()}} system calls. The client, of course, can calls {{mmap()}} of the descriptor to implement zero-copy reads with respect to its process boundary. It can also call {{ioctl()}} and {{madvise()}} to specify the OS buffer cache policy of the file. The additional flexibility can be quite useful to for implementing databases on HDFS. > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891348#comment-13891348 ] Haohui Mai commented on HDFS-5698: -- Thanks very much for the detailed comments from [~tlipcon]. I've filed HDFS-5884, HDFS-5885 and HDFS-5887 to address the comments. Thanks very much for the suggestions of the performance improvement. I'll dig into it. My plan is to commit HDFS-5884 and HDFS-5885 before the merge, and to continue to improve the code in trunk. Does it make sense for you? bq. would existing ImageVisitor implementation classes continue to work against the PB-ified image? The existing ImageVisitor implementation won't work with the PB FSImage. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698-design.pdf, HDFS-5698.000.patch, > HDFS-5698.001.patch, HDFS-5698.002.patch, HDFS-5698.003.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891350#comment-13891350 ] Colin Patrick McCabe commented on HDFS-5182: [~wheat9]: the shared memory segment, which we obtain via mmap, is a window into the file identified by the file descriptor. "Using a file descriptor" and "using a shared memory segment" are not two different approaches. They are two aspects of the same approach. You can read more about it here: http://en.wikipedia.org/wiki/Memory-mapped_file > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891343#comment-13891343 ] Todd Lipcon commented on HDFS-5399: --- +1 pending Jenkins > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5887) Add suffix to generated protobuf class
Haohui Mai created HDFS-5887: Summary: Add suffix to generated protobuf class Key: HDFS-5887 URL: https://issues.apache.org/jira/browse/HDFS-5887 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor As suggested by [~tlipcon], the code is more readable if we give each class generated by the protobuf the suffix "Proto". This jira proposes to rename the classes and to introduce no functionality changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5399: Attachment: HDFS-5399.001.patch Update the patch to fix TestHASafeMode#testClientRetrySafeMode. It also remove some redundant safemode.isOne check in FSNamesystem#checkNameNodeSafeMode. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5886) Potential null pointer deference in RpcProgramNfs3#readlink()
Ted Yu created HDFS-5886: Summary: Potential null pointer deference in RpcProgramNfs3#readlink() Key: HDFS-5886 URL: https://issues.apache.org/jira/browse/HDFS-5886 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Here is related code: {code} if (MAX_READ_TRANSFER_SIZE < target.getBytes().length) { return new READLINK3Response(Nfs3Status.NFS3ERR_IO, postOpAttr, null); } {code} READLINK3Response constructor would dereference the third parameter: {code} this.path = new byte[path.length]; {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5885) Add annotation for repeated fields in the protobuf definition
[ https://issues.apache.org/jira/browse/HDFS-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5885: - Attachment: HDFS-5885.000.patch > Add annotation for repeated fields in the protobuf definition > - > > Key: HDFS-5885 > URL: https://issues.apache.org/jira/browse/HDFS-5885 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5885.000.patch > > > As suggested by the documentation of Protocol Buffers, the protobuf > specification of the fsimage should specify [packed=true] for all repeated > fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header
[ https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5884: - Attachment: HDFS-5884.000.patch > LoadDelegator should use IOUtils.readFully() to read the magic header > - > > Key: HDFS-5884 > URL: https://issues.apache.org/jira/browse/HDFS-5884 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5884.000.patch > > > Currently FSImageFormat.LoadDelegator reads the magic header using > {{FileInputStream.read()}}. It does not guarantee that the magic header is > fully read. It should use {{IOUtils.readFully()}} instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header
[ https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5884: - Description: Currently FSImageFormat.LoadDelegator reads the magic header using {{FileInputStream.read()}}. It does not guarantee that the magic header is fully read. It should use {{IOUtils.readFully()}} instead. (was: Currently FSImageFormat.LoadDelegator reads the magic header using {FileInputStream.read()}. It does not guarantee that the magic header is fully read. It should use IOUtils.readFully() instead.) > LoadDelegator should use IOUtils.readFully() to read the magic header > - > > Key: HDFS-5884 > URL: https://issues.apache.org/jira/browse/HDFS-5884 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently FSImageFormat.LoadDelegator reads the magic header using > {{FileInputStream.read()}}. It does not guarantee that the magic header is > fully read. It should use {{IOUtils.readFully()}} instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5885) Add annotation for repeated fields in the protobuf definition
Haohui Mai created HDFS-5885: Summary: Add annotation for repeated fields in the protobuf definition Key: HDFS-5885 URL: https://issues.apache.org/jira/browse/HDFS-5885 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai As suggested by the documentation of Protocol Buffers, the protobuf specification of the fsimage should specify [packed=true] for all repeated fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header
[ https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5884: - Target Version/s: HDFS-5698 (FSImage in protobuf) Affects Version/s: HDFS-5698 (FSImage in protobuf) > LoadDelegator should use IOUtils.readFully() to read the magic header > - > > Key: HDFS-5884 > URL: https://issues.apache.org/jira/browse/HDFS-5884 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently FSImageFormat.LoadDelegator reads the magic header using > {FileInputStream.read()}. It does not guarantee that the magic header is > fully read. It should use IOUtils.readFully() instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891316#comment-13891316 ] Brandon Li commented on HDFS-5585: -- [~kihwal], thanks for the patch! It provides two new dfsadmin CLIs. They are a bit different with that in the design doc. For example, looks like pingDatanode is used here to replace the CLI getDatanodeInfo described in the design doc, but pingDatanode doesn't return much information of upgrade status and so on. Could you elaborate more on the difference? > Provide admin commands for data node upgrade > > > Key: HDFS-5585 > URL: https://issues.apache.org/jira/browse/HDFS-5585 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5585.patch, HDFS-5585.patch > > > Several new methods to ClientDatanodeProtocol may need to be added to support > querying version, initiating upgrade, etc. The admin CLI needs to be added > as well. This primary use case is for rolling upgrade, but this can be used > for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header
Haohui Mai created HDFS-5884: Summary: LoadDelegator should use IOUtils.readFully() to read the magic header Key: HDFS-5884 URL: https://issues.apache.org/jira/browse/HDFS-5884 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Currently FSImageFormat.LoadDelegator reads the magic header using {FileInputStream.read()}. It does not guarantee that the magic header is fully read. It should use IOUtils.readFully() instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891315#comment-13891315 ] Todd Lipcon commented on HDFS-5399: --- Patch looks reasonable to me. Don't you need to update TestHASafeMode.testClientRetrySafeMode though, now that it doesn't retry for manual safe mode? > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable
[ https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891296#comment-13891296 ] Hadoop QA commented on HDFS-5873: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626947/HDFS-5873.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6025//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6025//console This message is automatically generated. > dfs.http.policy should have higher precedence over dfs.https.enable > --- > > Key: HDFS-5873 > URL: https://issues.apache.org/jira/browse/HDFS-5873 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch > > > If dfs.policy.http is defined in hdfs-site.xml, It should have higher > precedence. > In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is > set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead > 'HTTP_AND_HTTPS'. > Currently with this configuration, it activates HTTP_AND_HTTPS policy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5882) TestAuditLogs is flaky
[ https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HDFS-5882: - Assignee: Jimmy Xiang > TestAuditLogs is flaky > -- > > Key: HDFS-5882 > URL: https://issues.apache.org/jira/browse/HDFS-5882 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > > TestAuditLogs fails sometimes: > {noformat} > Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs > testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) > Time elapsed: 2.085 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:92) > at org.junit.Assert.assertTrue(Assert.java:43) > at org.junit.Assert.assertNotNull(Assert.java:526) > at org.junit.Assert.assertNotNull(Assert.java:537) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) > at > org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891284#comment-13891284 ] Todd Lipcon commented on HDFS-5698: --- A few notes on the patch: {code} +is = new FileInputStream(file); +if (is.read(magic) == magic.length {code} Should use IOUtils.readFully here - Can we rename INodeSection and the other nested proto classes to end in "PB" or "Proto"? It's helpful when reading the code to distinguish the generated protobuf classes from the other structures, and given that these inner classes get imported, it's not always obvious. Performance-wise, I think you can really improve things by re-using protobuf objects. In particular, rather than doing something like: {code} + INodeSection.INodeReference ref = INodeSection.INodeReference + .parseDelimitedFrom(in); + return loadINodeReference(ref, dir); {code} you can make a thread-local INodeSection.INodeReference.Builder object (similar to how we use thread-local ops in the editlog loader code). Then use Builder.mergeDelimitedFrom instead of the static parseDelimitedFrom method. You can check isInitialized() after this to ensure that all of the required fields are present, and then use the builder itself to read the fields. This avoids repeated object allocation/deallocation costs without having to resort to manual parsing that you mention in the design doc. The generated code also has a handy "FooProtoOrBuilder" interface that both the generated PB and its builder implement, with all of the appropriate getters. The code that actually handles constructing HDFS objects from PBs could easily take this interface. For many of the repeated int64 fields, you should probably use the {{[packed=true]}} option in the protobuf definition. This saves a good amount of space and probably improves decoding performance as well. One question: would existing ImageVisitor implementation classes continue to work against the PB-ified image? My reading of the patch is that they wouldn't, but would be nice to confirm. I don't think any of the above needs to block the merge, but the format-breaking one (packed=true) should probably be done sooner rather than later. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698-design.pdf, HDFS-5698.000.patch, > HDFS-5698.001.patch, HDFS-5698.002.patch, HDFS-5698.003.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5883) TestZKPermissionsWatcher.testPermissionsWatcher fails sometimes
Jimmy Xiang created HDFS-5883: - Summary: TestZKPermissionsWatcher.testPermissionsWatcher fails sometimes Key: HDFS-5883 URL: https://issues.apache.org/jira/browse/HDFS-5883 Project: Hadoop HDFS Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial It looks like sleeping 100 ms is not enough for the permission change to propagate to other watchers. Will increase the sleeping time a little. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5399: Status: Patch Available (was: Open) > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5399: Attachment: HDFS-5399.000.patch Initial patch for review. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-5399.000.patch > > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891257#comment-13891257 ] Haohui Mai commented on HDFS-5182: -- I'm curious why shared memory segment is necessary -- given the ability to pass file descriptors around, the client can read the data using the file descriptor directly. I see a couple potential issues of using shared memory segment to implement zero-copy I/O: # No lazy reads. It seems that you're calling mlock() on the datanode side to pin the the data to the physical memory. The whole block has to be read into the memory even if the client is only interested some parts of the file (e.g. the index of the database) # SIGBUS. The client does not have SIGBUS at the cost of (1) the data is pinned to the physical memory, and (2) the datanode can have SIGBUS when there is an I/O error. If the client is using the file descriptor directly, the OS will manage the data using its buffer cache, and there will be no SIGBUS errors on both sides. # VM space. Indeed it won't exhaust the 64-bit virtual memory space, but a process running inside a container could have limited vm space (e.g., 1 GB) I'm wondering what would be the downsides of passing the file descriptor directly. Can you comment on this? > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Status: Patch Available (was: Open) > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Attachment: hdfs-4239_v5.patch > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5882) TestAuditLogs is flaky
Jimmy Xiang created HDFS-5882: - Summary: TestAuditLogs is flaky Key: HDFS-5882 URL: https://issues.apache.org/jira/browse/HDFS-5882 Project: Hadoop HDFS Issue Type: Test Reporter: Jimmy Xiang Priority: Minor TestAuditLogs fails sometimes: {noformat} Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) Time elapsed: 2.085 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312) at org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295) at org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891238#comment-13891238 ] Colin Patrick McCabe commented on HDFS-5182: [~wheat9]: I think you're mixing up the two choices a little bit. choice #1 does pass the file descriptor, and use the shared memory segment for communication. Choice #2 passes everything over UNIX domain sockets. SIGBUS is not an issue since the shared memory segment is in memory (SIGBUS should only happen on disk error). Virtual memory space is not an issue on 64-bit machines. Portability is not an issue since Windows supports shared memory as well, as you note. > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891230#comment-13891230 ] Colin Patrick McCabe commented on HDFS-5746: This patch increased OK_JAVADOC_WARNINGS, which should have covered the 2 additional warnings. {code} - OK_JAVADOC_WARNINGS=14; + OK_JAVADOC_WARNINGS=16; {code} If we're getting javadoc warnings on clean builds, let's file a JIRA about increasing OK_JAVADOC_WARNINGS further and/or fixing javadoc warnings. The ones introduced in this patch were not fixable because they related to sun APIs. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch, > HDFS-5746.006.patch, HDFS-5746.007.patch, HDFS-5746.008.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5868: Target Version/s: 2.4.0 Affects Version/s: (was: 2.4.0) 2.2.0 Status: Patch Available (was: Open) > Make hsync implementation pluggable > --- > > Key: HDFS-5868 > URL: https://issues.apache.org/jira/browse/HDFS-5868 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0 >Reporter: Buddy > Attachments: HDFS-5868-branch-2.patch > > > The current implementation of hsync in BlockReceiver only works if the output > streams are instances of FileOutputStream. Therefore, there is currently no > way for a FSDatasetSpi plugin to implement hsync if it is not using standard > OS files. > One possible solution is to push the implementation of hsync into the > ReplicaOutputStreams class. This class is constructed by the > ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore > it can be extended. Instead of directly calling sync on the output stream, > BlockReceiver would call ReplicaOutputStream.sync. The default > implementation of sync in ReplicaOutputStream would be the same as the > current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5868: Attachment: HDFS-5868-branch-2.patch > Make hsync implementation pluggable > --- > > Key: HDFS-5868 > URL: https://issues.apache.org/jira/browse/HDFS-5868 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0 >Reporter: Buddy > Attachments: HDFS-5868-branch-2.patch > > > The current implementation of hsync in BlockReceiver only works if the output > streams are instances of FileOutputStream. Therefore, there is currently no > way for a FSDatasetSpi plugin to implement hsync if it is not using standard > OS files. > One possible solution is to push the implementation of hsync into the > ReplicaOutputStreams class. This class is constructed by the > ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore > it can be extended. Instead of directly calling sync on the output stream, > BlockReceiver would call ReplicaOutputStream.sync. The default > implementation of sync in ReplicaOutputStream would be the same as the > current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable
[ https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5868: Attachment: (was: HDFS-5868.patch) > Make hsync implementation pluggable > --- > > Key: HDFS-5868 > URL: https://issues.apache.org/jira/browse/HDFS-5868 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.4.0 >Reporter: Buddy > > The current implementation of hsync in BlockReceiver only works if the output > streams are instances of FileOutputStream. Therefore, there is currently no > way for a FSDatasetSpi plugin to implement hsync if it is not using standard > OS files. > One possible solution is to push the implementation of hsync into the > ReplicaOutputStreams class. This class is constructed by the > ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore > it can be extended. Instead of directly calling sync on the output stream, > BlockReceiver would call ReplicaOutputStream.sync. The default > implementation of sync in ReplicaOutputStream would be the same as the > current implementation in BlockReceiver. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891207#comment-13891207 ] Todd Lipcon commented on HDFS-5399: --- OK, thanks. Feel free to ping me via gchat (todd at cloudera dot com) if you want a quick review or if I can help out in any way. (sometimes I'm slower to notice JIRA comments) > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891204#comment-13891204 ] Jing Zhao commented on HDFS-5399: - I will post a patch today. And this jira already proposes to distinguish the manual safemode, so I will include both changes in the same patch and post it in this jira. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891196#comment-13891196 ] Todd Lipcon commented on HDFS-5399: --- bq. > we should limit the number of retries as Jing proposed above bq. I will create a jira and upload a patch for this. Thanks, Jing! Do you plan to get to this today? We have some internal testing blocked by this issue, so if you're busy I can try to take a whack at it instead. What do you think about the suggestion of making it only throw RetriableException if it's in the "extension" or "startup" safemode, and not "manual" safemode? > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891167#comment-13891167 ] Arun C Murthy commented on HDFS-4564: - Ok, thanks [~daryn]! > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, > HDFS-4564.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891154#comment-13891154 ] Daryn Sharp commented on HDFS-4564: --- [~acmurthy] Yes, but it needs the HADOOP-10301 patch committed. I think the pre-commit for this patch will fail w/o it. > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, > HDFS-4564.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5880) Fix a typo at the title of HDFS Snapshots document
[ https://issues.apache.org/jira/browse/HDFS-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891134#comment-13891134 ] Andrew Wang commented on HDFS-5880: --- I already have a fix for this in HDFS-5709, which is in the last stages of review. Do you mind if we close this as dupe? > Fix a typo at the title of HDFS Snapshots document > -- > > Key: HDFS-5880 > URL: https://issues.apache.org/jira/browse/HDFS-5880 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, snapshots >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Attachments: HDFS-5880.patch > > > The title of the HDFS Snapshots document is "HFDS Snapshots". > We should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891115#comment-13891115 ] Suresh Srinivas commented on HDFS-5746: --- [~cmccabe], Jenkins has been flagging two javadoc errors on recent runs. Is it related to this? > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch, > HDFS-5746.006.patch, HDFS-5746.007.patch, HDFS-5746.008.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891102#comment-13891102 ] Jing Zhao commented on HDFS-5399: - Thanks for the comment, Todd! bq. Maybe we should consider changing the extension so that, if we don't have a significant number of under-replicated blocks, we don't go through the extension? +1 for this. [~kihwal] has a similar proposal in HDFS-5145. Since DNs keep sending block report to SBN, and the NN will process all the pending DN msgs while starting the active services, maybe we can just simply skip the safemode extension even without checking the number of under-replicated blocks? bq. we should limit the number of retries as Jing proposed above I will create a jira and upload a patch for this. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891096#comment-13891096 ] Todd Lipcon commented on HDFS-5399: --- Perhaps a compromise that might work for now would be to make the NN only throw the RetriableException-wrapped SafeModeException in the case that it's in SafeModeExtension? ie the NN knows that it's on its way out of safemode, and the client just needs to "hang on for a little while". This is distinct from the other cases where an admin explicitly put the NN in safemode, or it got stuck in safemode at startup because there are missing blocks. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891086#comment-13891086 ] Arun C Murthy commented on HDFS-4564: - [~daryn] - Is this close? Thanks. > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, > HDFS-4564.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-4564: Target Version/s: 2.3.0 (was: 3.0.0, 2.3.0) > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, > HDFS-4564.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891074#comment-13891074 ] Arpit Gupta commented on HDFS-5399: --- [~atm] bq. Am I correct in assuming that the test you were running did not manually cause the NN to enter or leave safemode? Yes that is correct. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable
[ https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5873: - Attachment: HDFS-5873.001.patch The v1 patch updated SecureMode.apt.vm. > dfs.http.policy should have higher precedence over dfs.https.enable > --- > > Key: HDFS-5873 > URL: https://issues.apache.org/jira/browse/HDFS-5873 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Haohui Mai > Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch > > > If dfs.policy.http is defined in hdfs-site.xml, It should have higher > precedence. > In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is > set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead > 'HTTP_AND_HTTPS'. > Currently with this configuration, it activates HTTP_AND_HTTPS policy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5731) Refactoring to define interfaces between BM and NN and simplify the flow between them
[ https://issues.apache.org/jira/browse/HDFS-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891067#comment-13891067 ] Daryn Sharp commented on HDFS-5731: --- I'm trying to review this patch but it's a bit too much. Broad API and functional changes across the NN is difficult to verify for correctness. To expedite the review, this patch should be broken into multiple jiras where possible. Each jira should encompass one specific area of change. Examples include jiras for the changes to FSNamesystem's verifyReplication, getBlockLocations, commitBlockSynchronization, jmx changes, etc. Generally, api changes should accompany the patch for which they are required. The intent/reason for each change will be clear to the reviewer. > Refactoring to define interfaces between BM and NN and simplify the flow > between them > - > > Key: HDFS-5731 > URL: https://issues.apache.org/jira/browse/HDFS-5731 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Amir Langer > Attachments: > 0001-Separation-of-BM-from-NN-Step1-introduce-APIs-as-int.patch > > > Start the separation of BlockManager (BM) from NameNode (NN) by simplifying > the flow between the two components and defining API interfaces between them. > This is done to enable future transformation into a clean RPC protocol. > Logic to calls from Datanodes should be in the BM. > NN should interact with BM using few calls and BM should use the return types > as much as possible to pass information to the NN. > The emphasis is on restructuring the request execution flows between the NN > and BM in a way that will minimize the latency increase when the BM > implementation becomes remote. Namely, the API flows are restructured in a > way that BM is called at most once per request. > The two components (NN and BM) still exist in the same VM and share the same > memory space. > NN and BM share the same lifecycle – it is assumed that they can't > crash/restart separately. > There is still a 1:1 relationship between them. > APIs between NN and BM will be improved to not use the same object instances > and turned into a real protocol. > This task should maintain backward compatibility -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891066#comment-13891066 ] Todd Lipcon commented on HDFS-5399: --- Catching up on this issue now, so apologies if I've missed some context in my reading of the discussion. - One of the issues is that the SBN may be in Safe Mode while it's tailing. When it becomes active, it has the latest edits and can come out of safemode, but still goes through the extension. The original reason for the extension to prevent a replication storm in the case that the NN has only one replica of all the blocks, but several DNs haven't yet reported. In the SBN case, since we've already been running in standby mode for a while, it seems unlikely that the extension is necessary. Maybe we should consider changing the extension so that, if we don't have a significant number of under-replicated blocks, we don't go through the extension? - Regardless, we should limit the number of retries as Jing proposed above. Retrying indefinitely should never be our default. How about we introduce a configuration here and default to ~30sec of retries? Those who want to retry forever could reconfigure to a longer time period. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)