[jira] [Created] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5889:


 Summary: When rolling upgrade is in progress, standby NN should 
create checkpoint for downgrade.
 Key: HDFS-5889
 URL: https://issues.apache.org/jira/browse/HDFS-5889
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


After rolling upgrade is started and checkpoint is disabled, the edit log may 
grow to a huge size.  It is not a problem if rolling upgrade is finalized 
normally since NN keeps the current state in memory and it writes a new 
checkpoint during finalize.  However, it is a problem if admin decides to 
downgrade.  It could take a long time to apply edit log.  Rollback does not 
have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5709:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Andrew. Thanks also to Jing and Suresh for 
the reviews and discussion.

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Fix For: 2.4.0
>
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891872#comment-13891872
 ] 

Aaron T. Myers commented on HDFS-5709:
--

The javadoc issue is unrelated and is tracked by HADOOP-10325. The 
TestAuditLogs failure is spurious and is tracked by HDFS-5882. The 
TestDFSUpgradeFromImage failure is because we need to include the new binary 
file in order for that to pass.

Given that, I'm going to commit this momentarily.

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891867#comment-13891867
 ] 

Hudson commented on HDFS-5709:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5109 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5109/])
HDFS-5709. Improve NameNode upgrade with existing reserved paths and path 
components. Contributed by Andrew Wang. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1564645)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/xdoc/HdfsSnapshots.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeOptionParsing.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-2-reserved.tgz


> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891854#comment-13891854
 ] 

Vinay commented on HDFS-5869:
-

startCheckpoint() will be used only with BackupNode.

As Jing pointed out, we should disable the checkpointing from 
StandbyCheckpointer in StandbyNN when RollingUpgrade in progress.

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch, 
> h5869_20140205.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5869:
-

Attachment: h5869_20140205.patch

h5869_20140205.patch: updates the test and removes checkRollingUpgrade in 
startCheckpoint.

On a second thought, we should only disallow save namespace but allow 
checkpoint.  Otherwise, the edit log may become huge.

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch, 
> h5869_20140205.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891839#comment-13891839
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5869:
--

When totalEdits > 1, the first two transactions must be OP_START_LOG_SEGMENT 
and OP_UPGRADE_MARKER so that it won't save namespace.  These two transactions 
won't be lost.  Let me check the test to check it.

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891837#comment-13891837
 ] 

Vinay commented on HDFS-5869:
-

Got it.
saveNamespace() while loading OP_UPGRADE_MARKER will not include the current 
editlog segment because {{lastAppliedTxId}} of {{FSImage}} still will be 
pointing to previous segment/checkpoint's last txn. Also {{totalEdits}} will be 
1 so it wont try to save again.

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891824#comment-13891824
 ] 

Vinay commented on HDFS-5869:
-

One more thing..
{code}} else if (rollingUpgradeOpt == 
RollingUpgradeStartupOption.STARTED) {
  if (totalEdits > 1) {
// save namespace if this is not the second edit transaction
// (the first must be OP_START_LOG_SEGMENT)
fsNamesys.getFSImage().saveNamespace(fsNamesys);
  }{code}
When the standbyNN is restarted twice with RollingUpgradeStartupOption.STARTED 
option, we will loose the OP_UPGRADE_MARKER and hence rollingUpgradeInfo also 
will be lost.
Am I missing something here?

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891821#comment-13891821
 ] 

Vinay commented on HDFS-5869:
-

Oops. I had forgot that. Thanks for the update. 

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891816#comment-13891816
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5869:
--

Hi Vinay,

Thanks for looking at the patch.  FSImage.saveNamespace(..) already has 
endCurrentLogSegment, startLogSegmentAndWriteHeaderTxn and 
writeTransactionIdFileToStorage which are the same things as in rolling edit 
log.

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891812#comment-13891812
 ] 

Hadoop QA commented on HDFS-5709:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627049/hdfs-5709-7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6034//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6034//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6034//console

This message is automatically generated.

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891808#comment-13891808
 ] 

Vinay commented on HDFS-5869:
-

Patch looks good nicholas.
I think it will be better to roll edits after saveNamespace() during 
{{startRollingUpgrade()}}. It will have clear separation of edits also.

> When rolling upgrade is in progress, NN should only create checkpoint right 
> before the upgrade marker
> -
>
> Key: HDFS-5869
> URL: https://issues.apache.org/jira/browse/HDFS-5869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5869_20140204.patch, h5869_20140204b.patch
>
>
> - When starting rolling upgrade, NN should create a checkpoint before it 
> writes the upgrade marker edit log transaction.
> - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
> calls. Further, if NN restarts, it should create a checkpoint only right 
> before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891700#comment-13891700
 ] 

Hudson commented on HDFS-5399:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5106 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5106/])
Correct CHANGES.txt entry for HDFS-5399 (contributed by Jing, not Haohui) 
(todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1564632)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-5399. Revisit SafeModeException and corresponding retry policies. 
Contributed by Haohui Mai. (todd: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1564629)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java


> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.3.0
>
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891695#comment-13891695
 ] 

Hadoop QA commented on HDFS-5881:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12627066/HDFS-5881.branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6035//console

This message is automatically generated.

> Fix skip() of the short-circuit local reader in 0.23.
> -
>
> Key: HDFS-5881
> URL: https://issues.apache.org/jira/browse/HDFS-5881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5881.branch-0.23.patch
>
>
> It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a 
> part of HDFS-2834, which is an API change JIRA.  This bug causes to skip more 
> (as many as the new offsetFromChunkBoundary) data in certain cases.
> It is only for branch-0.23.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891692#comment-13891692
 ] 

Todd Lipcon commented on HDFS-5399:
---

Oops. I just committed and realized I accidentally credited Haohui instead of 
you, Jing -- been looking at his PB patch all day :) Sorry about that, I'll 
correct the CHANGES.txt entry right away.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.3.0
>
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-5399:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.3.0
>
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.

2014-02-04 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5881:
-

Status: Patch Available  (was: Open)

> Fix skip() of the short-circuit local reader in 0.23.
> -
>
> Key: HDFS-5881
> URL: https://issues.apache.org/jira/browse/HDFS-5881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5881.branch-0.23.patch
>
>
> It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a 
> part of HDFS-2834, which is an API change JIRA.  This bug causes to skip more 
> (as many as the new offsetFromChunkBoundary) data in certain cases.
> It is only for branch-0.23.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891688#comment-13891688
 ] 

Todd Lipcon commented on HDFS-5399:
---

The javadoc warnings are currently showing up on all builds (HADOOP-10325 
should address this).

I'll commit this to trunk, branch-2, and branch-2.3

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891685#comment-13891685
 ] 

Hadoop QA commented on HDFS-5399:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627046/HDFS-5399.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6033//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6033//console

This message is automatically generated.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.

2014-02-04 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5881:
-

Attachment: HDFS-5881.branch-0.23.patch

The patch includes a test case that reproduces the returning of incorrect data, 
which is fixed similarly to branch-2/trunk.  It additionally fixes the skip() 
return value bug.

The patch only applies to 0.23.

> Fix skip() of the short-circuit local reader in 0.23.
> -
>
> Key: HDFS-5881
> URL: https://issues.apache.org/jira/browse/HDFS-5881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5881.branch-0.23.patch
>
>
> It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a 
> part of HDFS-2834, which is an API change JIRA.  This bug causes to skip more 
> (as many as the new offsetFromChunkBoundary) data in certain cases.
> It is only for branch-0.23.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891668#comment-13891668
 ] 

Hadoop QA commented on HDFS-5873:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627034/HDFS-5873.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6031//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6031//console

This message is automatically generated.

> dfs.http.policy should have higher precedence over dfs.https.enable
> ---
>
> Key: HDFS-5873
> URL: https://issues.apache.org/jira/browse/HDFS-5873
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch, 
> HDFS-5873.002.patch
>
>
> If dfs.policy.http is defined in hdfs-site.xml, It should have higher 
> precedence.
> In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is 
> set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead 
> 'HTTP_AND_HTTPS'.
> Currently with this configuration, it activates HTTP_AND_HTTPS policy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891662#comment-13891662
 ] 

Hadoop QA commented on HDFS-5882:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627037/hdfs-5882.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6032//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6032//console

This message is automatically generated.

> TestAuditLogs is flaky
> --
>
> Key: HDFS-5882
> URL: https://issues.apache.org/jira/browse/HDFS-5882
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hdfs-5882.patch
>
>
> TestAuditLogs fails sometimes:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) 
>  Time elapsed: 2.085 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertNotNull(Assert.java:526)
>   at org.junit.Assert.assertNotNull(Assert.java:537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891661#comment-13891661
 ] 

Aaron T. Myers commented on HDFS-5709:
--

The latest patch looks good to me. +1 pending Jenkins.

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5881) Fix skip() of the short-circuit local reader in 0.23.

2014-02-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891652#comment-13891652
 ] 

Kihwal Lee commented on HDFS-5881:
--

This bug is even more lovelier than I originally thought.  skip() has another 
bug of returning wrong value. In this case, DFSInputStream regards the skip 
failed and creates a new BlockReaderLocal for subsequent reads. So the effect 
of original skip bug was sometimes hidden and incurred unnecessary overhead.

This "bug-masking bug" is not effective when the remaining data in the internal 
32KB buffer is none. I.e. the return value from skip() is correct and the same 
BlockReaderLocal instance is reused. So, after a chunk-aligned 32KB read and a 
skip/seek, followed by a read will hit the original bug, which returns wrong 
data.

The fix will make random reads faster and return correct data.

> Fix skip() of the short-circuit local reader in 0.23.
> -
>
> Key: HDFS-5881
> URL: https://issues.apache.org/jira/browse/HDFS-5881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> It looks like a bug in skip() was introduced by HDFS-2356 and got fixed as a 
> part of HDFS-2834, which is an API change JIRA.  This bug causes to skip more 
> (as many as the new offsetFromChunkBoundary) data in certain cases.
> It is only for branch-0.23.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891630#comment-13891630
 ] 

Hadoop QA commented on HDFS-5399:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627004/HDFS-5399.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.io.retry.TestFailoverProxy
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6030//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6030//console

This message is automatically generated.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891593#comment-13891593
 ] 

Todd Lipcon commented on HDFS-5399:
---

That seems reasonable to me. +1 on the new logic there, pending jenkins.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891587#comment-13891587
 ] 

Andrew Wang commented on HDFS-5709:
---

Thanks ATM and Jing for reviewing this! I updated the docs, added command line 
arg testing for upgrade, and also changed the LV name behavior.

Jing, right now I'm using the presence of the k/vs in the map to indicate that 
the "-renameReserved" flag was passed at all, which is why I didn't statically 
initialize the map with default values. I could switch it to use a boolean 
instead, but (with ATM's suggestion) we now have the same default suffix for 
all reserved paths, so adding a new default is as easy as putting it into the 
new static array in HdfsConstants.

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5709:
--

Summary: Improve NameNode upgrade with existing reserved paths and path 
components  (was: Improve upgrade with existing files and directories named 
".snapshot")

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5709) Improve NameNode upgrade with existing reserved paths and path components

2014-02-04 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5709:
--

Attachment: hdfs-5709-7.patch

> Improve NameNode upgrade with existing reserved paths and path components
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, 
> hdfs-5709-4.patch, hdfs-5709-5.patch, hdfs-5709-6.patch, hdfs-5709-7.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5885) Add annotation for repeated fields in the protobuf definition

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891573#comment-13891573
 ] 

Todd Lipcon commented on HDFS-5885:
---

As far as I know, the packed=true attribute only applies for primitive fields 
(based on my reading of the docs and of the protobuf code).

If we wanted to be extra compact, we could "shred" the BlockProtos into three 
separate packed lists of primitives, eg:

{code}
// "Shredded" version of BlockProto, used as a more compact encoding for a list
// of blocks.
message BlockProtoList {
  repeated uint64 block_ids = 1 [packed = true];
  repeated uint64 gen_stamps = 2 [packed = true];
  repeated uint64 sizes = 3 [packed = true];
}
{code}

The gains here are a couple of bytes per block. Think it's worth it?

> Add annotation for repeated fields in the protobuf definition
> -
>
> Key: HDFS-5885
> URL: https://issues.apache.org/jira/browse/HDFS-5885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5885.000.patch
>
>
> As suggested by the documentation of Protocol Buffers, the protobuf 
> specification of the fsimage should specify [packed=true] for all repeated 
> fields.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5399:


Attachment: HDFS-5399.003.patch

In the 003 patch I changed "retries >= maxRetries" to "retries - failovers > 
maxRetries". This can pass TestFailoverProxy in my local test.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> HDFS-5399.003.patch, hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891558#comment-13891558
 ] 

Jing Zhao commented on HDFS-5399:
-

I just found another issue: we increase the number of retries for both RETRY 
and FAILOVER_AND_RETRY in RetryInvocationHandler. In that case, if the 
max-retry-attempts is less than max-failover-attempts, we will fail before we 
achieve the maximum times of failover. 

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5880) Fix a typo at the title of HDFS Snapshots document

2014-02-04 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891552#comment-13891552
 ] 

Akira AJISAKA commented on HDFS-5880:
-

Thank you, [~andrew.wang]. Closing this issue.

> Fix a typo at the title of HDFS Snapshots document
> --
>
> Key: HDFS-5880
> URL: https://issues.apache.org/jira/browse/HDFS-5880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, snapshots
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-5880.patch
>
>
> The title of the HDFS Snapshots document is "HFDS Snapshots".
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5880) Fix a typo at the title of HDFS Snapshots document

2014-02-04 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5880:


Resolution: Duplicate
  Assignee: (was: Akira AJISAKA)
Status: Resolved  (was: Patch Available)

> Fix a typo at the title of HDFS Snapshots document
> --
>
> Key: HDFS-5880
> URL: https://issues.apache.org/jira/browse/HDFS-5880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, snapshots
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-5880.patch
>
>
> The title of the HDFS Snapshots document is "HFDS Snapshots".
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891547#comment-13891547
 ] 

Colin Patrick McCabe commented on HDFS-5182:


bq. Okay now it looks more clear to me now. Thanks for the explanation.

Glad to be helpful.

bq. My bad. I mixed Linux with SunOS. You can do it using sendmsg() / recvmsg() 
as you mentioned in the previous comments.

I didn't realize that ioctl was the way to do this under SunOS.  Interesting.  
Sending fds via {{sendmsg}} seems to work on all the modern UNIX variants, so I 
think that we're good there.  On Windows, we'll need to use {{DuplicateHandle}}.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-5399:
--

Attachment: hdfs-5399.002.patch

The issue was that the convenience constructors for 
FailoverOnNetworkExceptionRetry didn't maintain the old behavior of retrying 
multiple times, since they set numRetries to 0. This new patch sets numRetries 
to Integer.MAX_VALUE for those constructors, and fixes the test.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch, 
> hdfs-5399.002.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891541#comment-13891541
 ] 

Hadoop QA commented on HDFS-4239:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6028//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//console

This message is automatically generated.

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891536#comment-13891536
 ] 

Haohui Mai commented on HDFS-5182:
--

Okay now it looks more clear to me now. Thanks for the explanation.

bq. By the way, ioctl cannot be used to pass file descriptors in Linux

My bad. I mixed Linux with SunOS. You can do it using {{sendmsg()}} / 
{{recvmsg()}} as you mentioned in the previous comments.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891535#comment-13891535
 ] 

Todd Lipcon commented on HDFS-5399:
---

Hmm, looks like TestFailoverProxy is also failing with the patch. Any ideas?

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5888) Cannot chmod / with new Globber.

2014-02-04 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-5888:
-

 Summary: Cannot chmod / with new Globber.
 Key: HDFS-5888
 URL: https://issues.apache.org/jira/browse/HDFS-5888
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang


Due to some changes in the new Globber code, we can no longer chmod "/". We 
should support this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891534#comment-13891534
 ] 

Hadoop QA commented on HDFS-4239:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6027//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//console

This message is automatically generated.

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891531#comment-13891531
 ] 

Jimmy Xiang commented on HDFS-5882:
---

I was thinking to force-flush the logger to disk too, but there isn't an easy 
way. With the current patch, I don't see the problem any more locally.

> TestAuditLogs is flaky
> --
>
> Key: HDFS-5882
> URL: https://issues.apache.org/jira/browse/HDFS-5882
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hdfs-5882.patch
>
>
> TestAuditLogs fails sometimes:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) 
>  Time elapsed: 2.085 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertNotNull(Assert.java:526)
>   at org.junit.Assert.assertNotNull(Assert.java:537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891513#comment-13891513
 ] 

Jing Zhao commented on HDFS-5882:
-

Can we force the logger to flush here? Looks like the current patch can only 
decrease the possibility of failure?

> TestAuditLogs is flaky
> --
>
> Key: HDFS-5882
> URL: https://issues.apache.org/jira/browse/HDFS-5882
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hdfs-5882.patch
>
>
> TestAuditLogs fails sometimes:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) 
>  Time elapsed: 2.085 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertNotNull(Assert.java:526)
>   at org.junit.Assert.assertNotNull(Assert.java:537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment

2014-02-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891504#comment-13891504
 ] 

Colin Patrick McCabe commented on HDFS-5746:


[~sureshms]: we've been having a bunch of problems with the javadoc warning 
detection code.  I filed HADOOP-10325 to fix this properly.

> add ShortCircuitSharedMemorySegment
> ---
>
> Key: HDFS-5746
> URL: https://issues.apache.org/jira/browse/HDFS-5746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, 
> HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch, 
> HDFS-5746.006.patch, HDFS-5746.007.patch, HDFS-5746.008.patch
>
>
> Add ShortCircuitSharedMemorySegment, which will be used to communicate 
> information between the datanode and the client about whether a replica is 
> mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891500#comment-13891500
 ] 

Colin Patrick McCabe commented on HDFS-5182:


bq. I should have said it more concretely. What I'm proposing is that the DN 
passes the file descriptor to the client (e.g., using ioctl() in Linux).

Did you read my first comment?  It begins:

bq. One way (let's call this choice #1) was using a shared memory segment. This 
would take the form of a third file descriptor passed from the DataNode to the 
DFSClient

By the way, {{ioctl}} cannot be used to pass file descriptors in Linux.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-5882:
--

Attachment: hdfs-5882.patch

> TestAuditLogs is flaky
> --
>
> Key: HDFS-5882
> URL: https://issues.apache.org/jira/browse/HDFS-5882
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hdfs-5882.patch
>
>
> TestAuditLogs fails sometimes:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) 
>  Time elapsed: 2.085 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertNotNull(Assert.java:526)
>   at org.junit.Assert.assertNotNull(Assert.java:537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-5882:
--

Status: Patch Available  (was: Open)

> TestAuditLogs is flaky
> --
>
> Key: HDFS-5882
> URL: https://issues.apache.org/jira/browse/HDFS-5882
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hdfs-5882.patch
>
>
> TestAuditLogs fails sometimes:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) 
>  Time elapsed: 2.085 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertNotNull(Assert.java:526)
>   at org.junit.Assert.assertNotNull(Assert.java:537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5872) Validate configuration of dfs.http.policy

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-5872.
--

Resolution: Duplicate

HDFS-5873 includes the fix of this bug thus closing this one as a duplicate.

> Validate configuration of dfs.http.policy
> -
>
> Key: HDFS-5872
> URL: https://issues.apache.org/jira/browse/HDFS-5872
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> The current implementation does not complain on invalid values of 
> dfs.http.policy. The implementation should bail out to alert the user that he 
> / she has misconfigured the system.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5873:
-

Attachment: HDFS-5873.002.patch

The v2 patch adds a unit test to cover the precedence.

> dfs.http.policy should have higher precedence over dfs.https.enable
> ---
>
> Key: HDFS-5873
> URL: https://issues.apache.org/jira/browse/HDFS-5873
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch, 
> HDFS-5873.002.patch
>
>
> If dfs.policy.http is defined in hdfs-site.xml, It should have higher 
> precedence.
> In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is 
> set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead 
> 'HTTP_AND_HTTPS'.
> Currently with this configuration, it activates HTTP_AND_HTTPS policy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable

2014-02-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891467#comment-13891467
 ] 

Jing Zhao commented on HDFS-5873:
-

The new looks pretty good to me. It will be better to have a unit test to cover 
DFSUtil#getHttpPolicy. +1 after addressing this.

> dfs.http.policy should have higher precedence over dfs.https.enable
> ---
>
> Key: HDFS-5873
> URL: https://issues.apache.org/jira/browse/HDFS-5873
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch
>
>
> If dfs.policy.http is defined in hdfs-site.xml, It should have higher 
> precedence.
> In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is 
> set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead 
> 'HTTP_AND_HTTPS'.
> Currently with this configuration, it activates HTTP_AND_HTTPS policy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5868) Make hsync implementation pluggable

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891453#comment-13891453
 ] 

Hadoop QA commented on HDFS-5868:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12626979/HDFS-5868-branch-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
-14 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6026//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6026//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6026//console

This message is automatically generated.

> Make hsync implementation pluggable
> ---
>
> Key: HDFS-5868
> URL: https://issues.apache.org/jira/browse/HDFS-5868
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Buddy
> Attachments: HDFS-5868-branch-2.patch
>
>
> The current implementation of hsync in BlockReceiver only works if the output 
> streams are instances of FileOutputStream. Therefore, there is currently no 
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
> OS files.
> One possible solution is to push the implementation of hsync into the 
> ReplicaOutputStreams class. This class is constructed by the 
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
> it can be extended. Instead of directly calling sync on the output stream, 
> BlockReceiver would call ReplicaOutputStream.sync.  The default 
> implementation of sync in ReplicaOutputStream would be the same as the 
> current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5874) Should not compare DataNode current layout version with that of NameNode in DataStrorage

2014-02-04 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li reassigned HDFS-5874:


Assignee: Brandon Li

> Should not compare DataNode current layout version with that of NameNode in 
> DataStrorage
> 
>
> Key: HDFS-5874
> URL: https://issues.apache.org/jira/browse/HDFS-5874
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Brandon Li
>Assignee: Brandon Li
>
> As [~vinayrpet] pointed out in HDFS-5754: in DataStorage 
> DATANODE_LAYOUT_VERSION should not compare with NameNode layout version 
> anymore. 
> {noformat}
>   if (DataNodeLayoutVersion.supports(
>   LayoutVersion.Feature.FEDERATION,
>   HdfsConstants.DATANODE_LAYOUT_VERSION) && 
>   HdfsConstants.DATANODE_LAYOUT_VERSION == nsInfo.getLayoutVersion()) 
> {
> readProperties(sd, nsInfo.getLayoutVersion());
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header

2014-02-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5884.
-

   Resolution: Fixed
Fix Version/s: HDFS-5698 (FSImage in protobuf)
 Hadoop Flags: Reviewed

+1. I've committed this.

> LoadDelegator should use IOUtils.readFully() to read the magic header
> -
>
> Key: HDFS-5884
> URL: https://issues.apache.org/jira/browse/HDFS-5884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: HDFS-5698 (FSImage in protobuf)
>
> Attachments: HDFS-5884.000.patch
>
>
> Currently FSImageFormat.LoadDelegator reads the magic header using 
> {{FileInputStream.read()}}. It does not guarantee that the magic header is 
> fully read. It should use {{IOUtils.readFully()}} instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5885) Add annotation for repeated fields in the protobuf definition

2014-02-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891398#comment-13891398
 ] 

Jing Zhao commented on HDFS-5885:
-

According to protobuf document: "For historical reasons, repeated fields of 
basic numeric types aren't encoded as efficiently as they could be. New code 
should use the special option [packed=true] to get a more efficient encoding."

So I guess we only need to also add "[packed=true]" for basic numeric types 
like int64? Do we want to also add it to "repeated BlockProto blocks"? 
[~tlipcon], could you please comment on this?

> Add annotation for repeated fields in the protobuf definition
> -
>
> Key: HDFS-5885
> URL: https://issues.apache.org/jira/browse/HDFS-5885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5885.000.patch
>
>
> As suggested by the documentation of Protocol Buffers, the protobuf 
> specification of the fsimage should specify [packed=true] for all repeated 
> fields.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891378#comment-13891378
 ] 

Haohui Mai commented on HDFS-5182:
--

I should have said it more concretely. What I'm proposing is that the DN passes 
the file descriptor to the client (e.g., using {{ioctl()}} in Linux).

It seems to me that with this approach (1) the OS takes cares about the 
resource management, and (2) the client has more flexibility. The client can 
access the file using the {{read()}} and {{write()}} system calls. The client, 
of course, can calls {{mmap()}} of the descriptor to implement zero-copy reads 
with respect to its process boundary. It can also call {{ioctl()}} and 
{{madvise()}} to specify the OS buffer cache policy of the file. The additional 
flexibility can be quite useful to for implementing databases on HDFS.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage

2014-02-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891348#comment-13891348
 ] 

Haohui Mai commented on HDFS-5698:
--

Thanks very much for the detailed comments from [~tlipcon].

I've filed HDFS-5884, HDFS-5885 and HDFS-5887 to address the comments.

Thanks very much for the suggestions of the performance improvement. I'll dig 
into it.

My plan is to commit HDFS-5884 and HDFS-5885 before the merge, and to continue 
to improve the code in trunk. Does it make sense for you?

bq.  would existing ImageVisitor implementation classes continue to work 
against the PB-ified image? 
The existing ImageVisitor implementation won't work with the PB FSImage.

> Use protobuf to serialize / deserialize FSImage
> ---
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5698-design.pdf, HDFS-5698.000.patch, 
> HDFS-5698.001.patch, HDFS-5698.002.patch, HDFS-5698.003.patch
>
>
> Currently, the code serializes FSImage using in-house serialization 
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization / 
> deserialization. The current code paths of serialization / deserialization 
> have spent a lot of effort on maintaining compatibility. What is worse is 
> that they are mixed with the complex logic of reconstructing the namespace, 
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage 
> is practically defined by the implementation. An bug in implementation means 
> a bug in the specification. Furthermore, it also makes writing third-party 
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
> the layout version every time. Bumping out layout version requires (1) the 
> users to explicitly upgrade the clusters, and (2) putting new code to 
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the 
> responsibility of serialization and reconstructing the namespace. The 
> protobuf files document the current format of the FSImage. The developers now 
> can add optional fields with ease, since the old code can always read the new 
> FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891350#comment-13891350
 ] 

Colin Patrick McCabe commented on HDFS-5182:


[~wheat9]: the shared memory segment, which we obtain via mmap, is a window 
into the file identified by the file descriptor.  "Using a file descriptor" and 
"using a shared memory segment" are not two different approaches.  They are two 
aspects of the same approach.

You can read more about it here: http://en.wikipedia.org/wiki/Memory-mapped_file

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891343#comment-13891343
 ] 

Todd Lipcon commented on HDFS-5399:
---

+1 pending Jenkins

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5887) Add suffix to generated protobuf class

2014-02-04 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5887:


 Summary: Add suffix to generated protobuf class
 Key: HDFS-5887
 URL: https://issues.apache.org/jira/browse/HDFS-5887
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor


As suggested by [~tlipcon], the code is more readable if we give each class 
generated by the protobuf the suffix "Proto".

This jira proposes to rename the classes and to introduce no functionality 
changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5399:


Attachment: HDFS-5399.001.patch

Update the patch to fix TestHASafeMode#testClientRetrySafeMode. It also remove 
some redundant safemode.isOne check in FSNamesystem#checkNameNodeSafeMode.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch, HDFS-5399.001.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5886) Potential null pointer deference in RpcProgramNfs3#readlink()

2014-02-04 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5886:


 Summary: Potential null pointer deference in 
RpcProgramNfs3#readlink()
 Key: HDFS-5886
 URL: https://issues.apache.org/jira/browse/HDFS-5886
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu


Here is related code:
{code}
  if (MAX_READ_TRANSFER_SIZE < target.getBytes().length) {
return new READLINK3Response(Nfs3Status.NFS3ERR_IO, postOpAttr, null);
  }
{code}
READLINK3Response constructor would dereference the third parameter:
{code}
this.path = new byte[path.length];
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5885) Add annotation for repeated fields in the protobuf definition

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5885:
-

Attachment: HDFS-5885.000.patch

> Add annotation for repeated fields in the protobuf definition
> -
>
> Key: HDFS-5885
> URL: https://issues.apache.org/jira/browse/HDFS-5885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5885.000.patch
>
>
> As suggested by the documentation of Protocol Buffers, the protobuf 
> specification of the fsimage should specify [packed=true] for all repeated 
> fields.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5884:
-

Attachment: HDFS-5884.000.patch

> LoadDelegator should use IOUtils.readFully() to read the magic header
> -
>
> Key: HDFS-5884
> URL: https://issues.apache.org/jira/browse/HDFS-5884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5884.000.patch
>
>
> Currently FSImageFormat.LoadDelegator reads the magic header using 
> {{FileInputStream.read()}}. It does not guarantee that the magic header is 
> fully read. It should use {{IOUtils.readFully()}} instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5884:
-

Description: Currently FSImageFormat.LoadDelegator reads the magic header 
using {{FileInputStream.read()}}. It does not guarantee that the magic header 
is fully read. It should use {{IOUtils.readFully()}} instead.  (was: Currently 
FSImageFormat.LoadDelegator reads the magic header using 
{FileInputStream.read()}. It does not guarantee that the magic header is fully 
read. It should use IOUtils.readFully() instead.)

> LoadDelegator should use IOUtils.readFully() to read the magic header
> -
>
> Key: HDFS-5884
> URL: https://issues.apache.org/jira/browse/HDFS-5884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently FSImageFormat.LoadDelegator reads the magic header using 
> {{FileInputStream.read()}}. It does not guarantee that the magic header is 
> fully read. It should use {{IOUtils.readFully()}} instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5885) Add annotation for repeated fields in the protobuf definition

2014-02-04 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5885:


 Summary: Add annotation for repeated fields in the protobuf 
definition
 Key: HDFS-5885
 URL: https://issues.apache.org/jira/browse/HDFS-5885
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai


As suggested by the documentation of Protocol Buffers, the protobuf 
specification of the fsimage should specify [packed=true] for all repeated 
fields.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5884:
-

 Target Version/s: HDFS-5698 (FSImage in protobuf)
Affects Version/s: HDFS-5698 (FSImage in protobuf)

> LoadDelegator should use IOUtils.readFully() to read the magic header
> -
>
> Key: HDFS-5884
> URL: https://issues.apache.org/jira/browse/HDFS-5884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-5698 (FSImage in protobuf)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently FSImageFormat.LoadDelegator reads the magic header using 
> {FileInputStream.read()}. It does not guarantee that the magic header is 
> fully read. It should use IOUtils.readFully() instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-02-04 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891316#comment-13891316
 ] 

Brandon Li commented on HDFS-5585:
--

[~kihwal], thanks for the patch!
It provides two new dfsadmin CLIs. They are a bit different with that in the 
design doc. For example, looks like pingDatanode is used here to replace the 
CLI getDatanodeInfo described in the design doc, but pingDatanode doesn't 
return much information of upgrade status and so on. Could you elaborate more 
on the difference? 

> Provide admin commands for data node upgrade
> 
>
> Key: HDFS-5585
> URL: https://issues.apache.org/jira/browse/HDFS-5585
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5585.patch, HDFS-5585.patch
>
>
> Several new methods to ClientDatanodeProtocol may need to be added to support 
> querying version, initiating upgrade, etc.  The admin CLI needs to be added 
> as well. This primary use case is for rolling upgrade, but this can be used 
> for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5884) LoadDelegator should use IOUtils.readFully() to read the magic header

2014-02-04 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5884:


 Summary: LoadDelegator should use IOUtils.readFully() to read the 
magic header
 Key: HDFS-5884
 URL: https://issues.apache.org/jira/browse/HDFS-5884
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently FSImageFormat.LoadDelegator reads the magic header using 
{FileInputStream.read()}. It does not guarantee that the magic header is fully 
read. It should use IOUtils.readFully() instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891315#comment-13891315
 ] 

Todd Lipcon commented on HDFS-5399:
---

Patch looks reasonable to me. Don't you need to update 
TestHASafeMode.testClientRetrySafeMode though, now that it doesn't retry for 
manual safe mode?

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable

2014-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891296#comment-13891296
 ] 

Hadoop QA commented on HDFS-5873:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12626947/HDFS-5873.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6025//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6025//console

This message is automatically generated.

> dfs.http.policy should have higher precedence over dfs.https.enable
> ---
>
> Key: HDFS-5873
> URL: https://issues.apache.org/jira/browse/HDFS-5873
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch
>
>
> If dfs.policy.http is defined in hdfs-site.xml, It should have higher 
> precedence.
> In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is 
> set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead 
> 'HTTP_AND_HTTPS'.
> Currently with this configuration, it activates HTTP_AND_HTTPS policy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HDFS-5882:
-

Assignee: Jimmy Xiang

> TestAuditLogs is flaky
> --
>
> Key: HDFS-5882
> URL: https://issues.apache.org/jira/browse/HDFS-5882
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
>
> TestAuditLogs fails sometimes:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
> testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs) 
>  Time elapsed: 2.085 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertNotNull(Assert.java:526)
>   at org.junit.Assert.assertNotNull(Assert.java:537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891284#comment-13891284
 ] 

Todd Lipcon commented on HDFS-5698:
---

A few notes on the patch:

{code}
+is = new FileInputStream(file);
+if (is.read(magic) == magic.length
{code}

Should use IOUtils.readFully here

-

Can we rename INodeSection and the other nested proto classes to end in "PB" or 
"Proto"? It's helpful when reading the code to distinguish the generated 
protobuf classes from the other structures, and given that these inner classes 
get imported, it's not always obvious.



Performance-wise, I think you can really improve things by re-using protobuf 
objects. In particular, rather than doing something like:

{code}
+  INodeSection.INodeReference ref = INodeSection.INodeReference
+  .parseDelimitedFrom(in);
+  return loadINodeReference(ref, dir);
{code}

you can make a thread-local INodeSection.INodeReference.Builder object (similar 
to how we use thread-local ops in the editlog loader code). Then use 
Builder.mergeDelimitedFrom instead of the static parseDelimitedFrom method. You 
can check isInitialized() after this to ensure that all of the required fields 
are present, and then use the builder itself to read the fields. This avoids 
repeated object allocation/deallocation costs without having to resort to 
manual parsing that you mention in the design doc.

The generated code also has a handy "FooProtoOrBuilder" interface that both the 
generated PB and its builder implement, with all of the appropriate getters. 
The code that actually handles constructing HDFS objects from PBs could easily 
take this interface.



For many of the repeated int64 fields, you should probably use the 
{{[packed=true]}} option in the protobuf definition. This saves a good amount 
of space and probably improves decoding performance as well.



One question: would existing ImageVisitor implementation classes continue to 
work against the PB-ified image? My reading of the patch is that they wouldn't, 
but would be nice to confirm.



I don't think any of the above needs to block the merge, but the 
format-breaking one (packed=true) should probably be done sooner rather than 
later.

> Use protobuf to serialize / deserialize FSImage
> ---
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5698-design.pdf, HDFS-5698.000.patch, 
> HDFS-5698.001.patch, HDFS-5698.002.patch, HDFS-5698.003.patch
>
>
> Currently, the code serializes FSImage using in-house serialization 
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization / 
> deserialization. The current code paths of serialization / deserialization 
> have spent a lot of effort on maintaining compatibility. What is worse is 
> that they are mixed with the complex logic of reconstructing the namespace, 
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage 
> is practically defined by the implementation. An bug in implementation means 
> a bug in the specification. Furthermore, it also makes writing third-party 
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
> the layout version every time. Bumping out layout version requires (1) the 
> users to explicitly upgrade the clusters, and (2) putting new code to 
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the 
> responsibility of serialization and reconstructing the namespace. The 
> protobuf files document the current format of the FSImage. The developers now 
> can add optional fields with ease, since the old code can always read the new 
> FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5883) TestZKPermissionsWatcher.testPermissionsWatcher fails sometimes

2014-02-04 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HDFS-5883:
-

 Summary: TestZKPermissionsWatcher.testPermissionsWatcher fails 
sometimes
 Key: HDFS-5883
 URL: https://issues.apache.org/jira/browse/HDFS-5883
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial


It looks like sleeping 100 ms is not enough for the permission change to 
propagate to other watchers. Will increase the sleeping time a little.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5399:


Status: Patch Available  (was: Open)

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5399:


Attachment: HDFS-5399.000.patch

Initial patch for review.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5399.000.patch
>
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891257#comment-13891257
 ] 

Haohui Mai commented on HDFS-5182:
--

I'm curious why shared memory segment is necessary -- given the ability to pass 
file descriptors around, the client can read the data using the file descriptor 
directly.

I see a couple potential issues of using shared memory segment to implement 
zero-copy I/O:

# No lazy reads. It seems that you're calling mlock() on the datanode side to 
pin the the data to the physical memory. The whole block has to be read into 
the memory even if the client is only interested some parts of the file (e.g. 
the index of the database)
# SIGBUS. The client does not have SIGBUS at the cost of (1) the data is pinned 
to the physical memory, and (2) the datanode can have SIGBUS when there is an 
I/O error. If the client is using the file descriptor directly, the OS will 
manage the data using its buffer cache, and there will be no SIGBUS errors on 
both sides.
# VM space. Indeed it won't exhaust the 64-bit virtual memory space, but a 
process running inside a container could have limited vm space (e.g., 1 GB)

I'm wondering what would be the downsides of passing the file descriptor 
directly. Can you comment on this?

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-02-04 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4239:
--

Status: Patch Available  (was: Open)

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-02-04 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4239:
--

Attachment: hdfs-4239_v5.patch

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5882) TestAuditLogs is flaky

2014-02-04 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HDFS-5882:
-

 Summary: TestAuditLogs is flaky
 Key: HDFS-5882
 URL: https://issues.apache.org/jira/browse/HDFS-5882
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Jimmy Xiang
Priority: Minor


TestAuditLogs fails sometimes:

{noformat}
Running org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.913 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
testAuditAllowedStat[1](org.apache.hadoop.hdfs.server.namenode.TestAuditLogs)  
Time elapsed: 2.085 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.junit.Assert.assertNotNull(Assert.java:537)
at 
org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogsRepeat(TestAuditLogs.java:312)
at 
org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.verifyAuditLogs(TestAuditLogs.java:295)
at 
org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditAllowedStat(TestAuditLogs.java:163)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2014-02-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891238#comment-13891238
 ] 

Colin Patrick McCabe commented on HDFS-5182:


[~wheat9]: I think you're mixing up the two choices a little bit.  choice #1 
does pass the file descriptor, and use the shared memory segment for 
communication.  Choice #2 passes everything over UNIX domain sockets.  SIGBUS 
is not an issue since the shared memory segment is in memory (SIGBUS should 
only happen on disk error).  Virtual memory space is not an issue on 64-bit 
machines.  Portability is not an issue since Windows supports shared memory as 
well, as you note.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment

2014-02-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891230#comment-13891230
 ] 

Colin Patrick McCabe commented on HDFS-5746:


This patch increased OK_JAVADOC_WARNINGS, which should have covered the 2 
additional warnings.

{code}
-  OK_JAVADOC_WARNINGS=14;
+  OK_JAVADOC_WARNINGS=16;
{code}

If we're getting javadoc warnings on clean builds, let's file a JIRA about 
increasing OK_JAVADOC_WARNINGS further and/or fixing javadoc warnings.  The 
ones introduced in this patch were not fixable because they related to sun APIs.

> add ShortCircuitSharedMemorySegment
> ---
>
> Key: HDFS-5746
> URL: https://issues.apache.org/jira/browse/HDFS-5746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, 
> HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch, 
> HDFS-5746.006.patch, HDFS-5746.007.patch, HDFS-5746.008.patch
>
>
> Add ShortCircuitSharedMemorySegment, which will be used to communicate 
> information between the datanode and the client about whether a replica is 
> mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable

2014-02-04 Thread Buddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buddy updated HDFS-5868:


 Target Version/s: 2.4.0
Affects Version/s: (was: 2.4.0)
   2.2.0
   Status: Patch Available  (was: Open)

> Make hsync implementation pluggable
> ---
>
> Key: HDFS-5868
> URL: https://issues.apache.org/jira/browse/HDFS-5868
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Buddy
> Attachments: HDFS-5868-branch-2.patch
>
>
> The current implementation of hsync in BlockReceiver only works if the output 
> streams are instances of FileOutputStream. Therefore, there is currently no 
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
> OS files.
> One possible solution is to push the implementation of hsync into the 
> ReplicaOutputStreams class. This class is constructed by the 
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
> it can be extended. Instead of directly calling sync on the output stream, 
> BlockReceiver would call ReplicaOutputStream.sync.  The default 
> implementation of sync in ReplicaOutputStream would be the same as the 
> current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable

2014-02-04 Thread Buddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buddy updated HDFS-5868:


Attachment: HDFS-5868-branch-2.patch

> Make hsync implementation pluggable
> ---
>
> Key: HDFS-5868
> URL: https://issues.apache.org/jira/browse/HDFS-5868
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Buddy
> Attachments: HDFS-5868-branch-2.patch
>
>
> The current implementation of hsync in BlockReceiver only works if the output 
> streams are instances of FileOutputStream. Therefore, there is currently no 
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
> OS files.
> One possible solution is to push the implementation of hsync into the 
> ReplicaOutputStreams class. This class is constructed by the 
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
> it can be extended. Instead of directly calling sync on the output stream, 
> BlockReceiver would call ReplicaOutputStream.sync.  The default 
> implementation of sync in ReplicaOutputStream would be the same as the 
> current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5868) Make hsync implementation pluggable

2014-02-04 Thread Buddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buddy updated HDFS-5868:


Attachment: (was: HDFS-5868.patch)

> Make hsync implementation pluggable
> ---
>
> Key: HDFS-5868
> URL: https://issues.apache.org/jira/browse/HDFS-5868
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Buddy
>
> The current implementation of hsync in BlockReceiver only works if the output 
> streams are instances of FileOutputStream. Therefore, there is currently no 
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
> OS files.
> One possible solution is to push the implementation of hsync into the 
> ReplicaOutputStreams class. This class is constructed by the 
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
> it can be extended. Instead of directly calling sync on the output stream, 
> BlockReceiver would call ReplicaOutputStream.sync.  The default 
> implementation of sync in ReplicaOutputStream would be the same as the 
> current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891207#comment-13891207
 ] 

Todd Lipcon commented on HDFS-5399:
---

OK, thanks. Feel free to ping me via gchat (todd at cloudera dot com) if you 
want a quick review or if I can help out in any way. (sometimes I'm slower to 
notice JIRA comments)

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891204#comment-13891204
 ] 

Jing Zhao commented on HDFS-5399:
-

I will post a patch today. And this jira already proposes to distinguish the 
manual safemode, so I will include both changes in the same patch and post it 
in this jira.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891196#comment-13891196
 ] 

Todd Lipcon commented on HDFS-5399:
---

bq. > we should limit the number of retries as Jing proposed above
bq. I will create a jira and upload a patch for this.

Thanks, Jing! Do you plan to get to this today? We have some internal testing 
blocked by this issue, so if you're busy I can try to take a whack at it 
instead.

What do you think about the suggestion of making it only throw 
RetriableException if it's in the "extension" or "startup" safemode, and not 
"manual" safemode?

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-02-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891167#comment-13891167
 ] 

Arun C Murthy commented on HDFS-4564:
-

Ok, thanks [~daryn]!

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, 
> HDFS-4564.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-02-04 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891154#comment-13891154
 ] 

Daryn Sharp commented on HDFS-4564:
---

[~acmurthy] Yes, but it needs the HADOOP-10301 patch committed.  I think the 
pre-commit for this patch will fail w/o it.

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, 
> HDFS-4564.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5880) Fix a typo at the title of HDFS Snapshots document

2014-02-04 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891134#comment-13891134
 ] 

Andrew Wang commented on HDFS-5880:
---

I already have a fix for this in HDFS-5709, which is in the last stages of 
review. Do you mind if we close this as dupe?

> Fix a typo at the title of HDFS Snapshots document
> --
>
> Key: HDFS-5880
> URL: https://issues.apache.org/jira/browse/HDFS-5880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, snapshots
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-5880.patch
>
>
> The title of the HDFS Snapshots document is "HFDS Snapshots".
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment

2014-02-04 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891115#comment-13891115
 ] 

Suresh Srinivas commented on HDFS-5746:
---

[~cmccabe], Jenkins has been flagging two javadoc errors on recent runs. Is it 
related to this?

> add ShortCircuitSharedMemorySegment
> ---
>
> Key: HDFS-5746
> URL: https://issues.apache.org/jira/browse/HDFS-5746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, 
> HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch, 
> HDFS-5746.006.patch, HDFS-5746.007.patch, HDFS-5746.008.patch
>
>
> Add ShortCircuitSharedMemorySegment, which will be used to communicate 
> information between the datanode and the client about whether a replica is 
> mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891102#comment-13891102
 ] 

Jing Zhao commented on HDFS-5399:
-

Thanks for the comment, Todd!

bq. Maybe we should consider changing the extension so that, if we don't have a 
significant number of under-replicated blocks, we don't go through the 
extension?
+1 for this. [~kihwal] has a similar proposal in HDFS-5145. Since DNs keep 
sending block report to SBN, and the NN will process all the pending DN msgs 
while starting the active services, maybe we can just simply skip the safemode 
extension even without checking the number of under-replicated blocks?

bq. we should limit the number of retries as Jing proposed above
I will create a jira and upload a patch for this.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891096#comment-13891096
 ] 

Todd Lipcon commented on HDFS-5399:
---

Perhaps a compromise that might work for now would be to make the NN only throw 
the RetriableException-wrapped SafeModeException in the case that it's in 
SafeModeExtension? ie the NN knows that it's on its way out of safemode, and 
the client just needs to "hang on for a little while". This is distinct from 
the other cases where an admin explicitly put the NN in safemode, or it got 
stuck in safemode at startup because there are missing blocks.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-02-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891086#comment-13891086
 ] 

Arun C Murthy commented on HDFS-4564:
-

[~daryn] - Is this close? Thanks.

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, 
> HDFS-4564.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-02-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-4564:


Target Version/s: 2.3.0  (was: 3.0.0, 2.3.0)

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, 
> HDFS-4564.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891074#comment-13891074
 ] 

Arpit Gupta commented on HDFS-5399:
---

[~atm]

bq.  Am I correct in assuming that the test you were running did not manually 
cause the NN to enter or leave safemode?

Yes that is correct.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5873) dfs.http.policy should have higher precedence over dfs.https.enable

2014-02-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5873:
-

Attachment: HDFS-5873.001.patch

The v1 patch updated SecureMode.apt.vm.

> dfs.http.policy should have higher precedence over dfs.https.enable
> ---
>
> Key: HDFS-5873
> URL: https://issues.apache.org/jira/browse/HDFS-5873
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Attachments: HDFS-5873.000.patch, HDFS-5873.001.patch
>
>
> If dfs.policy.http is defined in hdfs-site.xml, It should have higher 
> precedence.
> In hdfs-site.xml, if dfs.https.enable is set to true and dfs.http.policy is 
> set to HTTP_ONLY, The affecting policy should be 'HTTP_ONLY' instead 
> 'HTTP_AND_HTTPS'.
> Currently with this configuration, it activates HTTP_AND_HTTPS policy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5731) Refactoring to define interfaces between BM and NN and simplify the flow between them

2014-02-04 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891067#comment-13891067
 ] 

Daryn Sharp commented on HDFS-5731:
---

I'm trying to review this patch but it's a bit too much.  Broad API and 
functional changes across the NN is difficult to verify for correctness.

To expedite the review, this patch should be broken into multiple jiras where 
possible.  Each jira should encompass one specific area of change.  Examples 
include jiras for the changes to FSNamesystem's verifyReplication, 
getBlockLocations, commitBlockSynchronization, jmx changes, etc.

Generally, api changes should accompany the patch for which they are required.  
The intent/reason for each change will be clear to the reviewer.

> Refactoring to define interfaces between BM and NN and simplify the flow 
> between them
> -
>
> Key: HDFS-5731
> URL: https://issues.apache.org/jira/browse/HDFS-5731
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Amir Langer
> Attachments: 
> 0001-Separation-of-BM-from-NN-Step1-introduce-APIs-as-int.patch
>
>
> Start the separation of BlockManager (BM) from NameNode (NN) by simplifying 
> the flow between the two components and defining API interfaces between them. 
> This is done to enable future transformation into a clean RPC protocol.  
> Logic to calls from Datanodes should be in the BM.
> NN should interact with BM using few calls and BM should use the return types 
> as much as possible to pass information to the NN.
> The emphasis is on restructuring the request execution flows between the NN 
> and BM in a way that will minimize the latency increase when the BM 
> implementation becomes remote. Namely, the API flows are restructured in a 
> way that BM is called at most once per request. 
> The two components (NN and BM) still exist in the same VM and share the same 
> memory space.
> NN and BM share the same lifecycle – it is assumed that they can't 
> crash/restart separately. 
> There is still a 1:1 relationship  between them. 
> APIs between NN and BM will be improved to not use the same object instances 
> and turned into a real protocol.
> This task should maintain backward compatibility



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-02-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891066#comment-13891066
 ] 

Todd Lipcon commented on HDFS-5399:
---

Catching up on this issue now, so apologies if I've missed some context in my 
reading of the discussion.

- One of the issues is that the SBN may be in Safe Mode while it's tailing. 
When it becomes active, it has the latest edits and can come out of safemode, 
but still goes through the extension. The original reason for the extension to 
prevent a replication storm in the case that the NN has only one replica of all 
the blocks, but several DNs haven't yet reported. In the SBN case, since we've 
already been running in standby mode for a while, it seems unlikely that the 
extension is necessary. Maybe we should consider changing the extension so 
that, if we don't have a significant number of under-replicated blocks, we 
don't go through the extension?

- Regardless, we should limit the number of retries as Jing proposed above. 
Retrying indefinitely should never be our default. How about we introduce a 
configuration here and default to ~30sec of retries? Those who want to retry 
forever could reconfigure to a longer time period.

> Revisit SafeModeException and corresponding retry policies
> --
>
> Key: HDFS-5399
> URL: https://issues.apache.org/jira/browse/HDFS-5399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if 
> the NN is in SafeMode. Specifically, the client side's RPC adopts 
> MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
> is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. 
> Specifically, the SafeModeException is wrapped as a RetriableException in the 
> server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
> which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator 
> through CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between 
> SafeMode and retry policy for both HA and non-HA setup. A possible 
> straightforward solution is to always wrap the SafeModeException in the 
> RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >