[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-16 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v2.txt

v2 should address the javadoc and failed case

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v2.txt, HDFS-5776.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5794:


Attachment: HDFS-5794.001.patch

Thanks for the review, Nicholas and Colin! Rebase the patch and put CACHING 
last.

> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5794.000.patch, HDFS-5794.001.patch
>
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.

2014-01-16 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5748:
-

Status: Patch Available  (was: Open)

> Too much information shown in the dfs health page.
> --
>
> Key: HDFS-5748
> URL: https://issues.apache.org/jira/browse/HDFS-5748
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
> Attachments: HDFS-5748.000.patch, hdfs-5748.png
>
>
> I've noticed that the node lists are shown in the default name node web page. 
>  This may be fine for small clusters, but for clusters with 1000s of nodes, 
> this is not ideal. The following should be shown on demand. (Some of them 
> have been there even before the recent rework.)
> - Detailed data node information
> - Startup progress
> - Snapshot information



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.

2014-01-16 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5748:
-

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HDFS-5333)

> Too much information shown in the dfs health page.
> --
>
> Key: HDFS-5748
> URL: https://issues.apache.org/jira/browse/HDFS-5748
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
> Attachments: HDFS-5748.000.patch, hdfs-5748.png
>
>
> I've noticed that the node lists are shown in the default name node web page. 
>  This may be fine for small clusters, but for clusters with 1000s of nodes, 
> this is not ideal. The following should be shown on demand. (Some of them 
> have been there even before the recent rework.)
> - Detailed data node information
> - Startup progress
> - Snapshot information



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.

2014-01-16 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5748:
-

Attachment: HDFS-5748.000.patch

> Too much information shown in the dfs health page.
> --
>
> Key: HDFS-5748
> URL: https://issues.apache.org/jira/browse/HDFS-5748
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
> Attachments: HDFS-5748.000.patch, hdfs-5748.png
>
>
> I've noticed that the node lists are shown in the default name node web page. 
>  This may be fine for small clusters, but for clusters with 1000s of nodes, 
> this is not ideal. The following should be shown on demand. (Some of them 
> have been there even before the recent rework.)
> - Detailed data node information
> - Startup progress
> - Snapshot information



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5748) Too much information shown in the dfs health page.

2014-01-16 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5748:
-

Attachment: hdfs-5748.png

Convert information into tabs.

> Too much information shown in the dfs health page.
> --
>
> Key: HDFS-5748
> URL: https://issues.apache.org/jira/browse/HDFS-5748
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
> Attachments: HDFS-5748.000.patch, hdfs-5748.png
>
>
> I've noticed that the node lists are shown in the default name node web page. 
>  This may be fine for small clusters, but for clusters with 1000s of nodes, 
> this is not ideal. The following should be shown on demand. (Some of them 
> have been there even before the recent rework.)
> - Detailed data node information
> - Startup progress
> - Snapshot information



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874469#comment-13874469
 ] 

Hadoop QA commented on HDFS-5138:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623568/HDFS-5138.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

  org.apache.hadoop.hdfs.server.namenode.TestBackupNode

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5907//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5907//console

This message is automatically generated.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA

2014-01-16 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5138:
-

Attachment: HDFS-5138.patch

Here's an updated patch which implements Suresh's suggestion of doing away with 
the shared log lock and instead requiring that both NNs be shut down during the 
upgrade and the second NN re-bootstrapped after the first NN has performed the 
upgrade. I think this makes the procedure a little more complex for operators, 
but it certainly does simplify the code.

[~sureshms] - I don't feel super strongly about it either way, so please let me 
know if you find this more palatable. I'd appreciate a prompt review as this 
JIRA has been going on for quite some time and I'd like to get it done.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5153) Datanode should send block reports for each storage in a separate message

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874362#comment-13874362
 ] 

Hadoop QA commented on HDFS-5153:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623541/HDFS-5153.04.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5906//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5906//console

This message is automatically generated.

> Datanode should send block reports for each storage in a separate message
> -
>
> Key: HDFS-5153
> URL: https://issues.apache.org/jira/browse/HDFS-5153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, 
> HDFS-5153.03b.patch, HDFS-5153.04.patch
>
>
> When the number of blocks on the DataNode grows large we start running into a 
> few issues:
> # Block reports take a long time to process on the NameNode. In testing we 
> have seen that a block report with 6 Million blocks takes close to one second 
> to process on the NameNode. The NameSystem write lock is held during this 
> time.
> # We start hitting the default protobuf message limit of 64MB somewhere 
> around 10 Million blocks. While we can increase the message size limit it 
> already takes over 7 seconds to serialize/unserialize a block report of this 
> size.
> HDFS-2832 has introduced the concept of a DataNode as a collection of 
> storages i.e. the NameNode is aware of all the volumes (storage directories) 
> attached to a given DataNode. This makes it easy to split block reports from 
> the DN by sending one report per storage directory to mitigate the above 
> problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5794:
-

Component/s: namenode
   Priority: Minor  (was: Major)

+1 patch looks good.

> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5794.000.patch
>
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874341#comment-13874341
 ] 

Hudson commented on HDFS-5784:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5015 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5015/])
HDFS-5784. Reserve space in edit log header and fsimage header for feature flag 
section (cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558974)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutFlags.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournalNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0
>
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874328#comment-13874328
 ] 

Hadoop QA commented on HDFS-5794:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623524/HDFS-5794.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5905//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5905//console

This message is automatically generated.

> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5794.000.patch
>
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5784:
---

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0
>
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874307#comment-13874307
 ] 

Colin Patrick McCabe commented on HDFS-5784:


backport to branch-2 is pending the layout version fix in HDFS-5794

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5748) Too much information shown in the dfs health page.

2014-01-16 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned HDFS-5748:


Assignee: Haohui Mai

> Too much information shown in the dfs health page.
> --
>
> Key: HDFS-5748
> URL: https://issues.apache.org/jira/browse/HDFS-5748
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>
> I've noticed that the node lists are shown in the default name node web page. 
>  This may be fine for small clusters, but for clusters with 1000s of nodes, 
> this is not ideal. The following should be shown on demand. (Some of them 
> have been there even before the recent rework.)
> - Detailed data node information
> - Startup progress
> - Snapshot information



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874306#comment-13874306
 ] 

Colin Patrick McCabe commented on HDFS-5794:


+1 for this approach of putting CACHING last, once you've rebased and re-run 
jenkins.  HDFS-5784 added another layout version which CACHING should come 
after.

> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5794.000.patch
>
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5795) RemoteBlockReader2#checkSuccess() shoud print error status

2014-01-16 Thread Brandon Li (JIRA)
Brandon Li created HDFS-5795:


 Summary: RemoteBlockReader2#checkSuccess() shoud print error 
status 
 Key: HDFS-5795
 URL: https://issues.apache.org/jira/browse/HDFS-5795
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Brandon Li
Priority: Trivial


RemoteBlockReader2#checkSuccess() doesn't print error status, which makes debug 
harder when the client can't read from DataNode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"

2014-01-16 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874247#comment-13874247
 ] 

Jing Zhao commented on HDFS-5709:
-

The patch looks good to me. Some comments:
# In FSEditLogLoader, we also need to handle "OP_ADD_BLOCK" after HDFS-5704 got 
committed recently.
# It will be better to let renameReservedPathsOnUpgrade and 
renameReservedComponentOnUpgrade take a string as the third parameter instead 
of an instance of FSNamesystem. We may also want to pass in the to-be-replaced 
string as a parameter, and in that case, we can make these two methods act as 
more generic utility methods so that it can be used for other reserved names 
(e.g., "/.reserved/.inodes", which we may want to handle in a separate jira). 
# Then we can have another two methods in FSImageFormat/FSEditLogLoader to call 
the util methods, where we can check layoutversion and new names.
# When checking the new name, I think we should follow the rules in 
DFSUtil#isValidName?
# In renameReservedComponentOnUpgrade, maybe we do not need to convert the 
byte[] to string for the comparison? We have DOT_SNAPSHOT_DIR_BYTES defined in 
HdfsConstants already.
# There is an unused import in FSImageFormat.

> Improve upgrade with existing files and directories named ".snapshot"
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874219#comment-13874219
 ] 

Todd Lipcon commented on HDFS-5790:
---

As a quick check of the above, I did the following patch:

{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
 b/hadoop-hdfs-p
index 8b5fb81..62e60da 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
@@ -40,6 +40,7 @@
 import org.apache.hadoop.util.Daemon;
 
 import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
 import com.google.common.base.Preconditions;
 
 /**
@@ -256,6 +257,16 @@ public boolean expiredSoftLimit() {
  * @return the path associated with the pendingFile and null if not found.
  */
 private String findPath(INodeFile pendingFile) {
+  String retOrig = findPathOrig(pendingFile);
+  String retNew = pendingFile.getFullPathName();
+  if (!Objects.equal(retOrig, retNew)) {
+throw new AssertionError("orig implementation found: " + retOrig +
+ " new implementation found: " + retNew);
+  }
+  return retNew;
+}
+
+private String findPathOrig(INodeFile pendingFile) {
   try {
 for (String src : paths) {
   INode node = fsnamesystem.dir.getINode(src);
{code}

that is to say, I try the suggested optimization, along with the original 
implementation, and verify that they return the same results. I ran all the 
HDFS tests and they all passed, indicating that it's likely this optimization 
wouldn't break anything. And, it should be much faster, since it's O(directory 
depth) instead of O(number of leases held by the client * those lease's 
directory depths).

Anyone have opinions on this? [~kihwal] or [~daryn] maybe? (seem to recall both 
of you working in this area a few months back)

> LeaseManager.findPath is very slow when many leases need recovery
> -
>
> Key: HDFS-5790
> URL: https://issues.apache.org/jira/browse/HDFS-5790
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, performance
>Affects Versions: 2.4.0
>Reporter: Todd Lipcon
>
> We recently saw an issue where the NN restarted while tens of thousands of 
> files were open. The NN then ended up spending multiple seconds for each 
> commitBlockSynchronization() call, spending most of its time inside 
> LeaseManager.findPath(). findPath currently works by looping over all files 
> held for a given writer, and traversing the filesystem for each one. This 
> takes way too long when tens of thousands of files are open by a single 
> writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"

2014-01-16 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874217#comment-13874217
 ] 

Jing Zhao commented on HDFS-5709:
-

Thanks for the work Andrew! I'm now reviewing the patch and will post my 
comments soon.

> Improve upgrade with existing files and directories named ".snapshot"
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1

2014-01-16 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874212#comment-13874212
 ] 

Arpit Agarwal commented on HDFS-5434:
-

I think you are right, good catch. 

> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
> Attachments: BlockPlacementPolicyMinPipelineSize.java, 
> BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, 
> HDFS-5434-branch-2.patch, HDFS_5434.patch
>
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874210#comment-13874210
 ] 

Hadoop QA commented on HDFS-5709:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623493/hdfs-5709-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5904//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5904//console

This message is automatically generated.

> Improve upgrade with existing files and directories named ".snapshot"
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message

2014-01-16 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5153:


Attachment: HDFS-5153.04.patch

> Datanode should send block reports for each storage in a separate message
> -
>
> Key: HDFS-5153
> URL: https://issues.apache.org/jira/browse/HDFS-5153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, 
> HDFS-5153.03b.patch, HDFS-5153.04.patch
>
>
> When the number of blocks on the DataNode grows large we start running into a 
> few issues:
> # Block reports take a long time to process on the NameNode. In testing we 
> have seen that a block report with 6 Million blocks takes close to one second 
> to process on the NameNode. The NameSystem write lock is held during this 
> time.
> # We start hitting the default protobuf message limit of 64MB somewhere 
> around 10 Million blocks. While we can increase the message size limit it 
> already takes over 7 seconds to serialize/unserialize a block report of this 
> size.
> HDFS-2832 has introduced the concept of a DataNode as a collection of 
> storages i.e. the NameNode is aware of all the volumes (storage directories) 
> attached to a given DataNode. This makes it easy to split block reports from 
> the DN by sending one report per storage directory to mitigate the above 
> problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message

2014-01-16 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5153:


Attachment: (was: HDFS-5153.04.patch)

> Datanode should send block reports for each storage in a separate message
> -
>
> Key: HDFS-5153
> URL: https://issues.apache.org/jira/browse/HDFS-5153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, 
> HDFS-5153.03b.patch
>
>
> When the number of blocks on the DataNode grows large we start running into a 
> few issues:
> # Block reports take a long time to process on the NameNode. In testing we 
> have seen that a block report with 6 Million blocks takes close to one second 
> to process on the NameNode. The NameSystem write lock is held during this 
> time.
> # We start hitting the default protobuf message limit of 64MB somewhere 
> around 10 Million blocks. While we can increase the message size limit it 
> already takes over 7 seconds to serialize/unserialize a block report of this 
> size.
> HDFS-2832 has introduced the concept of a DataNode as a collection of 
> storages i.e. the NameNode is aware of all the volumes (storage directories) 
> attached to a given DataNode. This makes it easy to split block reports from 
> the DN by sending one report per storage directory to mitigate the above 
> problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message

2014-01-16 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5153:


Attachment: HDFS-5153.04.patch

Updated patch with the following changes:

# Introduces new conf key {{DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY}}. If the 
number of blocks on the DN is below this threshold then block reports for all 
storages are sent in a single message. Else the block reports will be split 
across messages. The default value is currently 1Million blocks.
# When splitting, reports for all storages are sent in quick succession. In the 
future we can consider spacing them apart.
# The 'staleness' of a DN is determined by whether all storages have reported 
since the last restart/failover.
# Added new test class {{TestDnRespectsBlockReportSplitThreshold}}.
# Refactored existing test {{TestBlockReport}} into base class 
{{BlockReportTestBase}} and derived classes for readability.

> Datanode should send block reports for each storage in a separate message
> -
>
> Key: HDFS-5153
> URL: https://issues.apache.org/jira/browse/HDFS-5153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, 
> HDFS-5153.03b.patch, HDFS-5153.04.patch
>
>
> When the number of blocks on the DataNode grows large we start running into a 
> few issues:
> # Block reports take a long time to process on the NameNode. In testing we 
> have seen that a block report with 6 Million blocks takes close to one second 
> to process on the NameNode. The NameSystem write lock is held during this 
> time.
> # We start hitting the default protobuf message limit of 64MB somewhere 
> around 10 Million blocks. While we can increase the message size limit it 
> already takes over 7 seconds to serialize/unserialize a block report of this 
> size.
> HDFS-2832 has introduced the concept of a DataNode as a collection of 
> storages i.e. the NameNode is aware of all the volumes (storage directories) 
> attached to a given DataNode. This makes it easy to split block reports from 
> the DN by sending one report per storage directory to mitigate the above 
> problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874190#comment-13874190
 ] 

Andrew Wang commented on HDFS-5784:
---

LGTM, +1

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874184#comment-13874184
 ] 

Colin Patrick McCabe commented on HDFS-5784:


The failure in TestOfflineEditsViewer is because jenkins can't update the 
binary editsStored file (it can't yet do binary diffs), and is expected.

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874185#comment-13874185
 ] 

Chris Nauroth commented on HDFS-5758:
-

bq. I'm referring to distinguishing the ACLs that are generated and the ones 
are specified by setfacl / chmod.

There are exactly 3 ACL entries that are inferred from permission bits: the 
owner, group and other entries.  These are never really "generated", because 
they always originated from someone's setfacl or chmod call.  If you feel 
strongly about this, let me know, and I'll change the patch to filter those 3 
entries from the output of {{getAclStatus}} and then add them back in the 
getfacl CLI.  I don't think the distinction provides an end user with any 
valuable information though.

bq. I see it as an optimization. Can you keep it in a separate patch?

No, reducing to a minimal ACL is a matter of correctness rather than 
optimization, so I don't think it can be separated to a different patch.  Those 
4 scenarios all eliminate the extended ACL, and correctness requires that we 
turn off the ACL bit.  (See example below.)  I suppose dropping the 
{{AclFeature}} could be thought of as an optimization, but it's going to be a 
tiny patch if I separate just that part.

bq. Based on your description, it seems to me that in removeAcl, the task can 
be done via looking up the group entries and set the permission back.

Yes, we can do that.  I'll put together a new patch for that.

{code}
[cnauroth@ubuntu:pts/0] acltest 

> touch file1

[cnauroth@ubuntu:pts/0] acltest 

> setfacl -m user:bruce:rwx file1

[cnauroth@ubuntu:pts/0] acltest 

> ls -lrt file1
-rw-rwxr--+ 1 cnauroth 0 Jan 16 15:58 file1*

[cnauroth@ubuntu:pts/0] acltest 

> getfacl file1
# file: file1
# owner: cnauroth
# group: cnauroth
user::rw-
user:bruce:rwx
group::rw-
mask::rwx
other::r--

[cnauroth@ubuntu:pts/0] acltest 

> setfacl -x user:bruce,mask file1

[cnauroth@ubuntu:pts/0] acltest 

> ls -lrt file1
-rw-rw-r-- 1 cnauroth 0 Jan 16 15:58 file1

[cnauroth@ubuntu:pts/0] acltest 

> getfacl file1
# file: file1
# owner: cnauroth
# group: cnauroth
user::rw-
group::rw-
other::r--
{code}


> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874162#comment-13874162
 ] 

Hadoop QA commented on HDFS-5784:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623472/HDFS-5784.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5903//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5903//console

This message is automatically generated.

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5794:


Status: Patch Available  (was: Open)

> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5794.000.patch
>
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5794:


Attachment: HDFS-5794.000.patch

One solution for this issue is to switch CACHING and 
ADD_DATANODE_AND_STORAGE_UUIDS in trunk. Post a simple patch following this 
solution.

> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5794.000.patch
>
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"

2014-01-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874125#comment-13874125
 ] 

Andrew Wang commented on HDFS-5709:
---

Woops, I should have mentioned that one. The config only needs to be validated 
if it needs to be used (i.e. we encounter a .snapshot path), so I felt it was 
better to keep all the validation inside that method. Thanks again ATM.

> Improve upgrade with existing files and directories named ".snapshot"
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5794:
---

 Summary: Fix the inconsistency of layout version number of 
ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
 Key: HDFS-5794
 URL: https://issues.apache.org/jira/browse/HDFS-5794
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Jing Zhao
Assignee: Jing Zhao


Currently in trunk, we have the layout version:
{code}
EDITLOG_ADD_BLOCK(-48, ...),
CACHING(-49, ...),
ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
{code}

And in branch-2, we have:
{code}
EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
{code}

We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus EDITLOG_ADD_BLOCK 
will also take -48 in branch-2. However, we cannot change 
ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
by trunk and branch-2 have the same layout -50 but branch-2 cannot read the -50 
fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5794) Fix the inconsistency of layout version number of ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2

2014-01-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5794:


Description: 
Currently in trunk, we have the layout version:
{code}
EDITLOG_ADD_BLOCK(-48, ...),
CACHING(-49, ...),
ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
{code}

And in branch-2, we have:
{code}
EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
{code}

We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus EDITLOG_ADD_BLOCK 
will also take -48 in branch-2. However, we cannot change 
ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
by trunk and branch-2 have the same layout -50 but branch-2 cannot read the -50 
fsimage if it is written by trunk.

  was:
Currently in trunk, we have the layout version:
{code}
EDITLOG_ADD_BLOCK(-48, ...),
CACHING(-49, ...),
ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
{code}

And in branch-2, we have:
{code}
EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
{code}

We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus EDITLOG_ADD_BLOCK 
will also take -48 in branch-2. However, we cannot change 
ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
by trunk and branch-2 have the same layout -50 but branch-2 cannot read the -50 
fsimage if it is written by trunk.


> Fix the inconsistency of layout version number of 
> ADD_DATANODE_AND_STORAGE_UUIDS between trunk and branch-2
> ---
>
> Key: HDFS-5794
> URL: https://issues.apache.org/jira/browse/HDFS-5794
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently in trunk, we have the layout version:
> {code}
> EDITLOG_ADD_BLOCK(-48, ...),
> CACHING(-49, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-50, ...);
> {code}
> And in branch-2, we have:
> {code}
> EDITLOG_SUPPORT_RETRYCACHE(-47, ...),
> ADD_DATANODE_AND_STORAGE_UUIDS(-49, -47, ...);
> {code}
> We plan to backport HDFS-5704 and HDFS-5777 to branch-2, thus 
> EDITLOG_ADD_BLOCK will also take -48 in branch-2. However, we cannot change 
> ADD_DATANODE_AND_STORAGE_UUIDS to -50 in branch-2. Otherwise fsimages written 
> by trunk and branch-2 have the same layout -50 but branch-2 cannot read the 
> -50 fsimage if it is written by trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5791) TestHttpsFileSystem should use a random port to avoid binding error during testing

2014-01-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5791:
-

Affects Version/s: 3.0.0

> TestHttpsFileSystem should use a random port to avoid binding error during 
> testing
> --
>
> Key: HDFS-5791
> URL: https://issues.apache.org/jira/browse/HDFS-5791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>
> {noformat}
> org.apache.hadoop.hdfs.web.TestHttpsFileSystem.org.apache.hadoop.hdfs.web.TestHttpsFileSystem
> Failing for the past 1 build (Since Failed#5900 )
> Took 2.7 sec.
> Error Message
> Port in use: localhost:50475
> Stacktrace
> java.net.BindException: Port in use: localhost:50475
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383)
>   at java.net.ServerSocket.bind(ServerSocket.java:328)
>   at java.net.ServerSocket.(ServerSocket.java:194)
>   at javax.net.ssl.SSLServerSocket.(SSLServerSocket.java:106)
>   at 
> com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.(SSLServerSocketImpl.java:108)
>   at 
> com.sun.net.ssl.internal.ssl.SSLServerSocketFactoryImpl.createServerSocket(SSLServerSocketFactoryImpl.java:72)
>   at 
> org.mortbay.jetty.security.SslSocketConnector.newServerSocket(SslSocketConnector.java:478)
>   at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
>   at org.apache.hadoop.http.HttpServer.openListeners(HttpServer.java:973)
>   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:413)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:770)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1847)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1747)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1217)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:683)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:351)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:332)
>   at 
> org.apache.hadoop.hdfs.web.TestHttpsFileSystem.setUp(TestHttpsFileSystem.java:64)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874113#comment-13874113
 ] 

Haohui Mai commented on HDFS-5758:
--

bq. This sounds like a scenario where one person is responsible for chmod and 
another person is responsible for setfacl, and you want to be able to tell who 
did which part of it. Is that right?

I'm referring to distinguishing the ACLs that are generated and the ones are 
specified by setfacl / chmod.

bq. There are 4 distinct scenarios that can cause reduction of an extended ACL 
to a minimal ACL:

 I see it as an optimization. Can you keep it in a separate patch?
 
Based on your description, it seems to me that in {{removeAcl}}, the task can 
be done via looking up the group entries and set the permission back. Note that 
the look up can be done efficiently since the list is sorted. You might be able 
to get rid of the filter function.

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large

2014-01-16 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874107#comment-13874107
 ] 

Nathan Roberts commented on HDFS-5788:
--

A simple solution is:
Restrict the size to dfs.ls.limit (default 1000) files OR dfs.ls.limit block 
locations, whichever comes first (obviously always returning only whole 
entries, so we could send more than this number of locations)

Yes, it will require more RPCs. However, it would seem to lower the risk of a 
DoS.  

> listLocatedStatus response can be very large
> 
>
> Key: HDFS-5788
> URL: https://issues.apache.org/jira/browse/HDFS-5788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> Currently we limit the size of listStatus requests to a default of 1000 
> entries. This works fine except in the case of listLocatedStatus where the 
> location information can be quite large. As an example, a directory with 7000 
> entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
> over 1MB. This can chew up very large amounts of memory in the NN if lots of 
> clients try to do this simultaneously.
> Seems like it would be better if we also considered the amount of location 
> information being returned when deciding how many files to return.
> Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874078#comment-13874078
 ] 

Chris Nauroth commented on HDFS-5758:
-

bq. It's difficult for the client to differentiate the ACLs you've generated 
and the ones that the admins have specified.

This sounds like a scenario where one person is responsible for chmod and 
another person is responsible for setfacl, and you want to be able to tell who 
did which part of it.  Is that right?  In general, it's going to be impossible 
to reliably tell the difference no matter which implementation choice we make.  
This is because setfacl is allowed to change permission bits.  Calling setfacl 
\-\-set user::rwx,group::r\-\-,other::r\-\- on a file that has no extended ACL 
changes just the permission bits.  Likewise, chmod is allowed to change 
extended ACL entries.  It can change one specific extended ACL entry, the mask 
entry, if the file has an extended ACL.

bq. We have to implement optimizations to get rid of minimal ACLs.

Does this refer to the code paths where we check for ACL size 3?  I don't see a 
way to eliminate this completely, but at least that logic is encapsulated 
behind {{AclStorage}}.  There are 4 distinct scenarios that can cause reduction 
of an extended ACL to a minimal ACL:

# {{FileSystem#removeAclEntries}} passes an ACL spec that selectively removes 
all of the extended ACL entries, both access and default.
# {{FileSystem#removeDefaultAcl}} removes all default ACL entries, and there 
are no extended access ACL entries.
# {{FileSystem#removeAcl}} removes all extended ACL entries.
# {{FileSystem#setAcl}} on an inode that has an extended ACL and the caller 
passes an ACL spec with exactly 3 entries corresponding to a minimal ACL, 
overwriting the extended ACL with a minimal ACL.

bq. It seems to me that in my proposal removeAclEntries cannot simply drops the 
ACL features in the inode. Am I missing something?

In addition to removing the {{AclFeature}} and toggling the ACL bit in the 
{{FsPermission}}, we must also get the group permissions out of the old ACL and 
put it back into the {{FsPermission}}.  (Recall that for an inode with an 
extended ACL, the {{FsPermission}} group bits are actually the mask, and this 
implementation choice simplifies integration with a lot of legacy APIs.)  See 
{{TestNameNodeAcl#testRemoveAclEntriesMinimal}} for a unit test that shows an 
example of this.  Initially, we set an extended ACL that includes user:foo:rwx. 
 Because of this, the inferred mask is mask::rwx, and this gets stored into the 
group bits of {{FsPermission}}.  After calling {{removeAclEntries}}, we reduce 
to a minimal ACL, and the group permissions get restored to group::rw-.  If we 
simply dropped the {{AclFeature}}, then we'd still be left with group::rwx in 
the {{FsPermission}}, which would be incorrect.

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large

2014-01-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874082#comment-13874082
 ] 

Suresh Srinivas commented on HDFS-5788:
---

bq. Then due to lack of flow control in the RPC layer we can fill up the heap 
with these given a large enough average response buffer per call and enough 
clients.
[~jlowe], thanks for the pointer.

We can certainly reduce the number of files returned in each iteration. But it 
would increase the number of requests to be processed by NameNode though.

> listLocatedStatus response can be very large
> 
>
> Key: HDFS-5788
> URL: https://issues.apache.org/jira/browse/HDFS-5788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> Currently we limit the size of listStatus requests to a default of 1000 
> entries. This works fine except in the case of listLocatedStatus where the 
> location information can be quite large. As an example, a directory with 7000 
> entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
> over 1MB. This can chew up very large amounts of memory in the NN if lots of 
> clients try to do this simultaneously.
> Seems like it would be better if we also considered the amount of location 
> information being returned when deciding how many files to return.
> Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5743) Use protobuf to serialize snapshot information

2014-01-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5743:


Attachment: HDFS-5743.000.patch

Initial patch. Still need to clean the code and fix unit tests.

> Use protobuf to serialize snapshot information
> --
>
> Key: HDFS-5743
> URL: https://issues.apache.org/jira/browse/HDFS-5743
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Jing Zhao
> Attachments: HDFS-5743.000.patch
>
>
> This jira tracks the efforts of using protobuf to serialize snapshot-related 
> information in FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5793) Optimize the serialization of PermissionStatus

2014-01-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874061#comment-13874061
 ] 

Haohui Mai commented on HDFS-5793:
--

I've tested the impact of this patch using a 512M fsimage on my machine. This 
patch reduces the loading time from 14s to 12s. It also reduces the size of the 
fsimage by 7%, from 508M to 469M.

> Optimize the serialization of PermissionStatus
> --
>
> Key: HDFS-5793
> URL: https://issues.apache.org/jira/browse/HDFS-5793
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5793.000.patch
>
>
> {{PermissionStatus}} contains the user name, the group name and the 
> permission. It is serialized into two strings and a short.
> Note that the size of unique user / groups names are relatively small, thus 
> this format has some performance penalties. The names are stored multiple 
> times, increasing both the storage size and the overhead of reconstructing 
> the names.
> This jira proposes to serialize {{PermissionStatus}} similar to its in-memory 
> layout. The code can record a mapping between user / group names and ids, and 
> pack user / group / permission into a single 64-bit long.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5793) Optimize the serialization of PermissionStatus

2014-01-16 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5793:


 Summary: Optimize the serialization of PermissionStatus
 Key: HDFS-5793
 URL: https://issues.apache.org/jira/browse/HDFS-5793
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5793.000.patch

{{PermissionStatus}} contains the user name, the group name and the permission. 
It is serialized into two strings and a short.

Note that the size of unique user / groups names are relatively small, thus 
this format has some performance penalties. The names are stored multiple 
times, increasing both the storage size and the overhead of reconstructing the 
names.

This jira proposes to serialize {{PermissionStatus}} similar to its in-memory 
layout. The code can record a mapping between user / group names and ids, and 
pack user / group / permission into a single 64-bit long.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5793) Optimize the serialization of PermissionStatus

2014-01-16 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5793:
-

Attachment: HDFS-5793.000.patch

> Optimize the serialization of PermissionStatus
> --
>
> Key: HDFS-5793
> URL: https://issues.apache.org/jira/browse/HDFS-5793
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5793.000.patch
>
>
> {{PermissionStatus}} contains the user name, the group name and the 
> permission. It is serialized into two strings and a short.
> Note that the size of unique user / groups names are relatively small, thus 
> this format has some performance penalties. The names are stored multiple 
> times, increasing both the storage size and the overhead of reconstructing 
> the names.
> This jira proposes to serialize {{PermissionStatus}} similar to its in-memory 
> layout. The code can record a mapping between user / group names and ids, and 
> pack user / group / permission into a single 64-bit long.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"

2014-01-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874046#comment-13874046
 ] 

Aaron T. Myers commented on HDFS-5709:
--

The latest patch looks pretty good to me. Docs look great, thanks for adding 
those.

Looks like you also didn't take my suggestion to move the path separator check 
to NN startup, but I'm guessing you had your reasons. +1 once you explain that, 
and pending a clean Jenkins run.

> Improve upgrade with existing files and directories named ".snapshot"
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874037#comment-13874037
 ] 

Haohui Mai commented on HDFS-5758:
--

bq. Is there a benefit to that change (other than mimicking libacl)? I suppose 
we could change getAclStatus like that, but it does mean that all potential 
clients (not just getfacl) must do both a getFileStatus and a getAclStatus to 
get a complete picture of permissions.

# It's difficult for the client to differentiate the ACLs you've generated and 
the ones that the admins have specified. The client can easily calculate the 
logical ACL by itself if the permissions and extended ACLs are separated.
# We have to implement optimizations to get rid of minimal ACLs.

It seems to me that in my proposal {{removeAclEntries}} cannot simply drops the 
ACL features in the inode. Am I missing something?

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1

2014-01-16 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874026#comment-13874026
 ] 

Eric Sirianni commented on HDFS-5434:
-

Arpit - after some discussion, Buddy and I realized that the approach in 
HDFS-5769 can't be applied to solve the pipeline recovery problem.  In a 
shared-storage environment, the R/O node only has access to the {{hsync()}}-ed 
data (the data actually flushed to the shared storage device).  However, 
pipeline recovery must guarantee that all data that has been {{ACK}}'d can be 
recovered.  Therefore, recruiting a R/O node into a pipeline (as HDFS-5769 
suggests) will not work for this case.

Is our understanding correct?  (If so I will also update HDFS-5769 accordingly)

> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
> Attachments: BlockPlacementPolicyMinPipelineSize.java, 
> BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, 
> HDFS-5434-branch-2.patch, HDFS_5434.patch
>
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874024#comment-13874024
 ] 

Chris Nauroth commented on HDFS-5758:
-

Is there a benefit to that change (other than mimicking libacl)?  I suppose we 
could change {{getAclStatus}} like that, but it does mean that all potential 
clients (not just getfacl) must do both a {{getFileStatus}} and a 
{{getAclStatus}} to get a complete picture of permissions.

I wasn't sure how {{removeAcl}} relates to that comment.  {{removeAcl}} is 
changing because it needs to be able to restore the group permissions into 
{{FsPermission}} (overwriting the mask) as described above.  This is also true 
for {{removeAclEntries}}, which can reduce an extended ACL to a minimal ACL if 
the ACL spec removes all extended entries.  It was easiest to handle this 
consistently through {{AclTransformation}}.

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"

2014-01-16 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5709:
--

Attachment: hdfs-5709-2.patch

Thanks for the review ATM, here's a new patch. I took all your comments as 
advised except for the static FSN method; I feel like it's better to pass 
around FSN than to make a new Configuration in a static context.

> Improve upgrade with existing files and directories named ".snapshot"
> -
>
> Key: HDFS-5709
> URL: https://issues.apache.org/jira/browse/HDFS-5709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: snapshots, upgrade
> Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch
>
>
> Right now in trunk, upgrade fails messily if the old fsimage or edits refer 
> to a directory named ".snapshot". We should at least print a better error 
> message (which I believe was the original intention in HDFS-4666), and [~atm] 
> proposed automatically renaming these files and directories.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1

2014-01-16 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873998#comment-13873998
 ] 

Arpit Agarwal commented on HDFS-5434:
-

Buddy - I think this will be handled by the approach [~sirianni] and I 
discussed on HDFS-5318 for the 'shared storage' scenario.

> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
> Attachments: BlockPlacementPolicyMinPipelineSize.java, 
> BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, 
> HDFS-5434-branch-2.patch, HDFS_5434.patch
>
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5730) Inconsistent Audit logging for HDFS APIs

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873994#comment-13873994
 ] 

Hadoop QA commented on HDFS-5730:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623446/HDFS-5730.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
  org.apache.hadoop.fs.TestSymlinkHdfsFileContext
  org.apache.hadoop.hdfs.server.namenode.TestFsck
  org.apache.hadoop.fs.TestResolveHdfsSymlink
  org.apache.hadoop.fs.TestSymlinkHdfsDisable
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5901//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5901//console

This message is automatically generated.

> Inconsistent Audit logging for HDFS APIs
> 
>
> Key: HDFS-5730
> URL: https://issues.apache.org/jira/browse/HDFS-5730
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-5730.patch
>
>
> When looking at the audit loggs in HDFS, I am seeing some inconsistencies 
> what was logged with audit and what is added recently.
> For more details please check the comments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1

2014-01-16 Thread Buddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buddy updated HDFS-5434:


Attachment: HDFS-5434-branch-2.patch

It looks like the consensus for 2.4 is to implement this with block placement 
policies. And these policies will not be part of the standard apache release. 

Unfortunately the default block placement policies are not currently extensible 
outside of the package. I've attached a patch that makes the constructors 
protected.




> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
> Attachments: BlockPlacementPolicyMinPipelineSize.java, 
> BlockPlacementPolicyMinPipelineSizeWithNodeGroup.java, 
> HDFS-5434-branch-2.patch, HDFS_5434.patch
>
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5792) DFSOutputStream.close() throws exceptions with unintuitive stacktraces

2014-01-16 Thread Eric Sirianni (JIRA)
Eric Sirianni created HDFS-5792:
---

 Summary: DFSOutputStream.close() throws exceptions with 
unintuitive stacktraces
 Key: HDFS-5792
 URL: https://issues.apache.org/jira/browse/HDFS-5792
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Eric Sirianni
Priority: Minor


Given the following client code:
{code}
class Foo {
  void test() {
FSDataOutputStream out = ...;
out.write(...);
out.close();
  }
}
{code}

A programmer would expect an exception thrown from {{out.close()}} to include 
the stack trace of the calling thread:
{noformat}
...
FSDataOutputStream.close()
Foo.test()
...
{noformat}

Instead, it includes the stack trace from the {{DataStreamer}} thread:
{noformat}
java.io.IOException: All datanodes 127.0.0.1:49331 are bad. Aborting...
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
{noformat}

This makes it difficult to debug the _client_ call stack that actually was 
unwinded when the exception was thrown.

A simple fix seems to be modifying {{DFSOutputStream.close()}} to wrap the 
{{lastException}} from the {{DataStreamer}} thread in a {{Exception}}, thereby 
getting both stack traces.  

I can work on a patch for this.  Can someone confirm that my approach is 
acceptable?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873974#comment-13873974
 ] 

Brandon Li commented on HDFS-5754:
--

The unit test failure was not introduced by this patch. I've filed HDFS-5791 to 
track the unit test fix.

> Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
> 
>
> Key: HDFS-5754
> URL: https://issues.apache.org/jira/browse/HDFS-5754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, 
> HDFS-5754.003.patch
>
>
> Currently, LayoutVersion defines the on-disk data format and supported 
> features of the entire cluster including NN and DNs.  LayoutVersion is 
> persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
> supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
> different LayoutVersion than NN cannot register with the NN.
> We propose to split LayoutVersion into two independent values that are local 
> to the nodes:
> - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
> the format of FSImage, editlog and the directory structure.
> - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
> the format of block data file, metadata file, block pool layout, and the 
> directory structure.  
> The LayoutVersion check will be removed in DN registration.  If 
> NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
> upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5791) TestHttpsFileSystem should use a random port to avoid binding error during testing

2014-01-16 Thread Brandon Li (JIRA)
Brandon Li created HDFS-5791:


 Summary: TestHttpsFileSystem should use a random port to avoid 
binding error during testing
 Key: HDFS-5791
 URL: https://issues.apache.org/jira/browse/HDFS-5791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Brandon Li


{noformat}
org.apache.hadoop.hdfs.web.TestHttpsFileSystem.org.apache.hadoop.hdfs.web.TestHttpsFileSystem
Failing for the past 1 build (Since Failed#5900 )
Took 2.7 sec.
Error Message

Port in use: localhost:50475

Stacktrace

java.net.BindException: Port in use: localhost:50475
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383)
at java.net.ServerSocket.bind(ServerSocket.java:328)
at java.net.ServerSocket.(ServerSocket.java:194)
at javax.net.ssl.SSLServerSocket.(SSLServerSocket.java:106)
at 
com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.(SSLServerSocketImpl.java:108)
at 
com.sun.net.ssl.internal.ssl.SSLServerSocketFactoryImpl.createServerSocket(SSLServerSocketFactoryImpl.java:72)
at 
org.mortbay.jetty.security.SslSocketConnector.newServerSocket(SslSocketConnector.java:478)
at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
at org.apache.hadoop.http.HttpServer.openListeners(HttpServer.java:973)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:413)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:770)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:316)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1847)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1747)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1217)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:683)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:351)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:332)
at 
org.apache.hadoop.hdfs.web.TestHttpsFileSystem.setUp(TestHttpsFileSystem.java:64)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873945#comment-13873945
 ] 

Haohui Mai commented on HDFS-5758:
--

I perfer not to baking in the concept of minimal / logical ACLs into the APIs. 
Instead, {{getAclStatus}} / {{removeAcl}} should only handle extended ACLs. It 
is only the access control that the code needs to take care of logical ACLs.

The clients, notably {{getfacl}}, can generate the logical ACL by reading the 
permissions and the extended acl. That's how libacl under linux works.

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873943#comment-13873943
 ] 

Hadoop QA commented on HDFS-5754:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623435/HDFS-5754.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestHttpsFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5900//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5900//console

This message is automatically generated.

> Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
> 
>
> Key: HDFS-5754
> URL: https://issues.apache.org/jira/browse/HDFS-5754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, 
> HDFS-5754.003.patch
>
>
> Currently, LayoutVersion defines the on-disk data format and supported 
> features of the entire cluster including NN and DNs.  LayoutVersion is 
> persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
> supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
> different LayoutVersion than NN cannot register with the NN.
> We propose to split LayoutVersion into two independent values that are local 
> to the nodes:
> - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
> the format of FSImage, editlog and the directory structure.
> - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
> the format of block data file, metadata file, block pool layout, and the 
> directory structure.  
> The LayoutVersion check will be removed in DN registration.  If 
> NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
> upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5784:
---

Attachment: HDFS-5784.003.patch

rebase

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch, 
> HDFS-5784.003.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873921#comment-13873921
 ] 

Todd Lipcon commented on HDFS-5790:
---

Looking at the code, it's not obvious why we need to do:

{code}
  src = leaseManager.findPath(pendingFile);
{code}

as opposed to something like:

{code}
  src = pendingFile.getFullPathName();
{code}

since in theory the inode path and the lease path should always be kept in 
sync. Same is true of the same call in commitOrCompleteLastBlock()

> LeaseManager.findPath is very slow when many leases need recovery
> -
>
> Key: HDFS-5790
> URL: https://issues.apache.org/jira/browse/HDFS-5790
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, performance
>Affects Versions: 2.4.0
>Reporter: Todd Lipcon
>
> We recently saw an issue where the NN restarted while tens of thousands of 
> files were open. The NN then ended up spending multiple seconds for each 
> commitBlockSynchronization() call, spending most of its time inside 
> LeaseManager.findPath(). findPath currently works by looping over all files 
> held for a given writer, and traversing the filesystem for each one. This 
> takes way too long when tens of thousands of files are open by a single 
> writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873917#comment-13873917
 ] 

Todd Lipcon commented on HDFS-5790:
---

The handler threads were spending most of their time in this stack trace:

{code}
"IPC Server handler 1 on 8055" daemon prio=10 tid=0x2ab5c87bc800 nid=0x71dc 
runnable [0x56e93000]
   java.lang.Thread.State: RUNNABLE
at java.util.Collections.indexedBinarySearch(Collections.java:215)
at java.util.Collections.binarySearch(Collections.java:201)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getChildINode(INodeDirectory.java:107)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getExistingPathINodes(INodeDirectory.java:211)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getNode(INodeDirectory.java:121)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.getNode(INodeDirectory.java:130)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINode(FSDirectory.java:1247)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileINode(FSDirectory.java:1234)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Lease.findPath(LeaseManager.java:256)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Lease.access$300(LeaseManager.java:225)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.findPath(LeaseManager.java:186)
- locked <0x2aaac5a38b48> (a 
org.apache.hadoop.hdfs.server.namenode.LeaseManager)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:3229)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:560)
{code}

(line numbers may be slightly off, since this is from an older release, but 
code appears to still be structured approximately the same in trunk)

> LeaseManager.findPath is very slow when many leases need recovery
> -
>
> Key: HDFS-5790
> URL: https://issues.apache.org/jira/browse/HDFS-5790
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, performance
>Affects Versions: 2.4.0
>Reporter: Todd Lipcon
>
> We recently saw an issue where the NN restarted while tens of thousands of 
> files were open. The NN then ended up spending multiple seconds for each 
> commitBlockSynchronization() call, spending most of its time inside 
> LeaseManager.findPath(). findPath currently works by looping over all files 
> held for a given writer, and traversing the filesystem for each one. This 
> takes way too long when tens of thousands of files are open by a single 
> writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5790:
-

 Summary: LeaseManager.findPath is very slow when many leases need 
recovery
 Key: HDFS-5790
 URL: https://issues.apache.org/jira/browse/HDFS-5790
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, performance
Affects Versions: 2.4.0
Reporter: Todd Lipcon


We recently saw an issue where the NN restarted while tens of thousands of 
files were open. The NN then ended up spending multiple seconds for each 
commitBlockSynchronization() call, spending most of its time inside 
LeaseManager.findPath(). findPath currently works by looping over all files 
held for a given writer, and traversing the filesystem for each one. This takes 
way too long when tens of thousands of files are open by a single writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873865#comment-13873865
 ] 

Chris Nauroth commented on HDFS-5758:
-

bq. It seems to me that you can enforce the permission by the following:

Yes, this is basically what the HDFS-5612 enforcement patch does, except we 
can't run the existing logic for checking {{FsPermission}} at all if an inode 
has an ACL, so it would be more like:

{code}
if (hasAcl) {
  decide whether acl allows the access
} else {
  checkFsPermission();
}
{code}

Always checking {{FsPermission}} before the ACL could erroneously deny access 
to someone who had been granted access through a named user or named group ACL 
entry.

bq. I'm not sure why AclStorage#updateINodeAcl and AclStorage#readINodeAcl need 
to construct entries from the old permissions.

There are a few different reasons for this:

# Every inode in the system really has a logical ACL.  Even if the ACL bit is 
off and there is no {{AclFeature}}, the inode still has a minimal ACL.  Calling 
{{getAclStatus}} on an inode with no ACL bit/no {{AclFeature}} still needs to 
return the minimal ACL, so we construct 3 ACL entries based on the permission 
bits.
# For an inode that has an ACL, the expected behavior of chmod is that it 
changes the mask entry instead of the group permission bits.  Likewise, file 
listings should show the mask permissions in place of the group permissions.  
By choosing to store the mask permission inside the group permission bits, we 
minimize the impact on existing code like {{setPermission}} and {{listStatus}}. 
 Those APIs continue to read/write bits in {{FsPermission}} without needing 
awareness that they are logically reading/writing the mask.  Only the ACL APIs 
need this awareness.
# Updates to the owner and other ACL entries are also supposed to update the 
corresponding owner and other permission bits.  For symmetry, if chmod changes 
those permissions, then the change is also supposed to be visible in the 
corresponding ACL entries.  Similar to the above, by choosing to reuse the 
corresponding {{FsPermission}} bits for this purpose, then a lot of existing 
code just works with no changes.  We also prevent the possibility of bugs in 
trying to keep 2 data sources in sync (one copy in {{FsPermission}} and one 
copy in an {{AclEntry}} instance).
# Finally, I expect a minor storage optimization as a result of this strategy.  
We're keeping 3 of the "logical" ACL entries in {{FsPermission}}, so that's 3 
fewer {{AclEntry}} instances to store inside each {{AclFeature}}.  
Additionally, this will increase the likelihood of de-duplication ("Global ACL 
Set" as described in the design doc) once we implement that.  If 2 ACLs differ 
only in their owner, mask or other entries, then their underlying 
{{AclFeature}} instances will still have the same contents, and we can 
de-duplicate them.

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873859#comment-13873859
 ] 

Hadoop QA commented on HDFS-5318:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12623422/HDFS-5318b-branch-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5899//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5899//console

This message is automatically generated.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
> HDFS-5318b-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large

2014-01-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873839#comment-13873839
 ] 

Jason Lowe commented on HDFS-5788:
--

They are usually short-lived but a bit longer-lived when we can't push them out 
the network in a timely manner.  Then due to lack of flow control in the RPC 
layer we can fill up the heap with these given a large enough average response 
buffer per call and enough clients.  See HADOOP-8942.

This change mitigates the issue for listLocatedStatus since a much smaller 
response payload means it takes a lot more simultaneous clients to consume an 
equal amount of heap space.

> listLocatedStatus response can be very large
> 
>
> Key: HDFS-5788
> URL: https://issues.apache.org/jira/browse/HDFS-5788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> Currently we limit the size of listStatus requests to a default of 1000 
> entries. This works fine except in the case of listLocatedStatus where the 
> location information can be quite large. As an example, a directory with 7000 
> entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
> over 1MB. This can chew up very large amounts of memory in the NN if lots of 
> clients try to do this simultaneously.
> Seems like it would be better if we also considered the amount of location 
> information being returned when deciding how many files to return.
> Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873832#comment-13873832
 ] 

Hadoop QA commented on HDFS-5784:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623450/HDFS-5784.002.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5902//console

This message is automatically generated.

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5784) reserve space in edit log header and fsimage header for feature flag section

2014-01-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5784:
---

Attachment: HDFS-5784.002.patch

> reserve space in edit log header and fsimage header for feature flag section
> 
>
> Key: HDFS-5784
> URL: https://issues.apache.org/jira/browse/HDFS-5784
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5784.001.patch, HDFS-5784.002.patch
>
>
> We should reserve space in the edit log header and fsimage header so that we 
> can add layout feature flags later in a compatible manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-01-16 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873817#comment-13873817
 ] 

Daryn Sharp commented on HDFS-4564:
---

This bug will trigger the NPE described HADOOP-9363.  The authenticator 
handling is flawed because the JDK will transparently SPNEGO auth, but webhdfs 
will unnecessary interpret a login failure as fallback to pseudo auth.  This 
triggers another JDK bug resulting in a replay attack.

For some reason, if there are 16 persistent connections open (webhdfs tries to 
disconnect, but the HttpURLConnection javadocs indicates it may not honor the 
close and it certainly does not per tcpdump), the com.sun code will send the 
OPTIONS request to a cached connection, and also immediately open a new 
connection to send the same OPTIONS request.  The new connection reuses the 
cached connection's service ticket and boom - replay attack.

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-01-16 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp reassigned HDFS-4564:
-

Assignee: Daryn Sharp

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5758) NameNode: complete implementation of inode modifications for ACLs.

2014-01-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873815#comment-13873815
 ] 

Haohui Mai commented on HDFS-5758:
--

It seems to me that you can enforce the permission by the following:

{code}
isAccessAllowed():

checkFsPermission();

if (hasAcl) {
decide whether acl allows the access
}
{code}

I'm not sure why AclStorage#updateINodeAcl and AclStorage#readINodeAcl need to 
construct entries from the old permissions.

> NameNode: complete implementation of inode modifications for ACLs.
> --
>
> Key: HDFS-5758
> URL: https://issues.apache.org/jira/browse/HDFS-5758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5758.1.patch, HDFS-5758.2.patch
>
>
> This patch will complete the remaining logic for the ACL get and set APIs, 
> including remaining work in {{FSNamesystem}}, {{FSDirectory}} and storage in 
> the inodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5789) Some of snapshot, Cache APIs missing checkOperation double check in fsn

2014-01-16 Thread Uma Maheswara Rao G (JIRA)
Uma Maheswara Rao G created HDFS-5789:
-

 Summary: Some of snapshot, Cache APIs missing checkOperation 
double check in fsn
 Key: HDFS-5789
 URL: https://issues.apache.org/jira/browse/HDFS-5789
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


HDFS-4591 introduced double checked for HA state while taking fsn lock.
checkoperation made before actually taking lock and after the lock again.

This pattern missed in some of the snapshot APIs and cache management related 
APIs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5730) Inconsistent Audit logging for HDFS APIs

2014-01-16 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5730:
--

Attachment: HDFS-5730.patch

I have attached initial version of patch here.

> Inconsistent Audit logging for HDFS APIs
> 
>
> Key: HDFS-5730
> URL: https://issues.apache.org/jira/browse/HDFS-5730
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-5730.patch
>
>
> When looking at the audit loggs in HDFS, I am seeing some inconsistencies 
> what was logged with audit and what is added recently.
> For more details please check the comments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5730) Inconsistent Audit logging for HDFS APIs

2014-01-16 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5730:
--

Status: Patch Available  (was: Open)

> Inconsistent Audit logging for HDFS APIs
> 
>
> Key: HDFS-5730
> URL: https://issues.apache.org/jira/browse/HDFS-5730
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-5730.patch
>
>
> When looking at the audit loggs in HDFS, I am seeing some inconsistencies 
> what was logged with audit and what is added recently.
> For more details please check the comments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5153) Datanode should send block reports for each storage in a separate message

2014-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873748#comment-13873748
 ] 

Hadoop QA commented on HDFS-5153:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623413/HDFS-5153.03b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5898//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5898//console

This message is automatically generated.

> Datanode should send block reports for each storage in a separate message
> -
>
> Key: HDFS-5153
> URL: https://issues.apache.org/jira/browse/HDFS-5153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, 
> HDFS-5153.03b.patch
>
>
> When the number of blocks on the DataNode grows large we start running into a 
> few issues:
> # Block reports take a long time to process on the NameNode. In testing we 
> have seen that a block report with 6 Million blocks takes close to one second 
> to process on the NameNode. The NameSystem write lock is held during this 
> time.
> # We start hitting the default protobuf message limit of 64MB somewhere 
> around 10 Million blocks. While we can increase the message size limit it 
> already takes over 7 seconds to serialize/unserialize a block report of this 
> size.
> HDFS-2832 has introduced the concept of a DataNode as a collection of 
> storages i.e. the NameNode is aware of all the volumes (storage directories) 
> attached to a given DataNode. This makes it easy to split block reports from 
> the DN by sending one report per storage directory to mitigate the above 
> problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-16 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5754:
-

Attachment: HDFS-5754.003.patch

> Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
> 
>
> Key: HDFS-5754
> URL: https://issues.apache.org/jira/browse/HDFS-5754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, 
> HDFS-5754.003.patch
>
>
> Currently, LayoutVersion defines the on-disk data format and supported 
> features of the entire cluster including NN and DNs.  LayoutVersion is 
> persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
> supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
> different LayoutVersion than NN cannot register with the NN.
> We propose to split LayoutVersion into two independent values that are local 
> to the nodes:
> - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
> the format of FSImage, editlog and the directory structure.
> - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
> the format of block data file, metadata file, block pool layout, and the 
> directory structure.  
> The LayoutVersion check will be removed in DN registration.  If 
> NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
> upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik resolved HDFS-4600.
--

Resolution: Invalid

The issues seems to be caused by specific cluster configuration rather than a 
software problem. Closing.

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873728#comment-13873728
 ] 

Konstantin Boudnik commented on HDFS-4600:
--

Actually you're right. I've re-read the history of the ticket and will close it 
right away. Please disregard my last comment ;)

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large

2014-01-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873720#comment-13873720
 ] 

Suresh Srinivas commented on HDFS-5788:
---

bq. a listLocatedStatus response is over 1MB
These are short lived objects and are garbage collected in young generation. 
This causes lot of issues?

bq. Seems like it would be better if we also considered the amount of location 
information being returned when deciding how many files to return.
Can you please add details about the solution?

> listLocatedStatus response can be very large
> 
>
> Key: HDFS-5788
> URL: https://issues.apache.org/jira/browse/HDFS-5788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> Currently we limit the size of listStatus requests to a default of 1000 
> entries. This works fine except in the case of listLocatedStatus where the 
> location information can be quite large. As an example, a directory with 7000 
> entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
> over 1MB. This can chew up very large amounts of memory in the NN if lots of 
> clients try to do this simultaneously.
> Seems like it would be better if we also considered the amount of location 
> information being returned when deciding how many files to return.
> Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.

2014-01-16 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873722#comment-13873722
 ] 

Chris Nauroth commented on HDFS-5608:
-

Thanks, Sachin.  I'll check out the new patch.

{quote}
Current implementation : mask::rw,other::rwx 
expected - mask:rw,other:rwx
{quote}

There are multiple ways in which our ACL spec syntax is a subset of the syntax 
accepted by Linux setfacl.  I just checked Linux, and it accepts both of the 
above.  Right now, our implementation only accepts one of them: mask::rw-.  I 
wasn't aware of this particular difference, so thank you for pointing it out.

Let's stick with our minimal supported syntax for now, because it's consistent 
with the display of getfacl.  This choice doesn't limit the functionality we 
provide.  It just means that some of the syntax shortcuts available on Linux 
aren't available here.

For the record, here are all of the syntax aspects supported on Linux that we 
don't yet support (that I'm aware of).  We can revisit later if we want to 
fully support all of this, but it's not critical for an initial implementation.
# The scope default can be shortened to d.
# The types user/group/mask/other can be shortened to u/g/m/o respectively.
# Permissions can be specified partially, with anything omitted assumed to be 
off.  For example, mask::r-x can be shortened to mask::rx.
# Permissions can be an octal digit 0-7.
# Whitespace between delimiter characters and non-delimiter characters is 
ignored.

{quote}
Where can we define the common method which be accessed from both 
projects(hdfs,common).
{quote}

Right now, I'm thinking that we can define a static method on the {{AclEntry}} 
class.  HADOOP-10213 is still open for some changes in the CLI parsing, so I'm 
going to discuss it over there with [~vinayrpet] and you.

> WebHDFS: implement GETACLSTATUS and SETACL.
> ---
>
> Key: HDFS-5608
> URL: https://issues.apache.org/jira/browse/HDFS-5608
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Sachin Jose
> Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch
>
>
> Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers

2014-01-16 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873719#comment-13873719
 ] 

Kousuke Saruta commented on HDFS-5761:
--

Thanks for your comment, Uma.
At first, I thought same as you.
I thought it's good to branch the logic depending on whether checksum type is 
NULL or not.
But, on second thought, BlockPoolSlice should not have logic which depends 
specific checksum algorithm.
How to verify is responsibility of each checksum algorithm.


> DataNode fails to validate integrity for checksum type NULL when DataNode 
> recovers 
> ---
>
> Key: HDFS-5761
> URL: https://issues.apache.org/jira/browse/HDFS-5761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: HDFS-5761.patch
>
>
> When DataNode is down during writing blocks, the blocks are not filinalized 
> and the next time DataNode recovers, integrity validation will run.
> But if we use NULL for checksum algorithm (we can set NULL to 
> dfs.checksum.type), DataNode will fail to validate integrity and cannot be 
> up. 
> The cause is in BlockPoolSlice#validateIntegrity.
> In the method, there is following code.
> {code}
> long numChunks = Math.min(
>   (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
>   (metaFileLen - crcHeaderLen)/checksumSize);
> {code}
> When we choose NULL checksum, checksumSize is 0 so ArithmeticException will 
> be thrown and DataNode cannot be up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873712#comment-13873712
 ] 

Suresh Srinivas commented on HDFS-4600:
---

I am actually surprised. Many people have expressed that this is not a bug. You 
also have expressed the same opinion in the comments above. What changed?

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873695#comment-13873695
 ] 

Konstantin Boudnik commented on HDFS-4600:
--

Suresh, I didn't see its fixed, so yes - this is still seems to be an issue.

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873691#comment-13873691
 ] 

Suresh Srinivas commented on HDFS-4600:
---

[~cos], looks like you changed the priority. Is this still an issue? If not, I 
plan on closing this as not a problem in a day or so.

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4600:
-

Priority: Major  (was: Minor)

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-01-16 Thread Eric Sirianni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5318:


Attachment: HDFS-5318b-branch-2.patch

Update patch.  
Add overload to {{BlocksMap.getStorages()}} that filters returned Iterable by 
storage state.  Use that overload in {{BlockManager}} instead of if...continue 
in loops.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
> HDFS-5318b-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-01-16 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873566#comment-13873566
 ] 

Arpit Agarwal commented on HDFS-5318:
-

Hi Eric, thanks for the new patch. I will try to review it by the weekend.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message

2014-01-16 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5153:


Attachment: HDFS-5153.03b.patch

> Datanode should send block reports for each storage in a separate message
> -
>
> Key: HDFS-5153
> URL: https://issues.apache.org/jira/browse/HDFS-5153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, 
> HDFS-5153.03b.patch
>
>
> When the number of blocks on the DataNode grows large we start running into a 
> few issues:
> # Block reports take a long time to process on the NameNode. In testing we 
> have seen that a block report with 6 Million blocks takes close to one second 
> to process on the NameNode. The NameSystem write lock is held during this 
> time.
> # We start hitting the default protobuf message limit of 64MB somewhere 
> around 10 Million blocks. While we can increase the message size limit it 
> already takes over 7 seconds to serialize/unserialize a block report of this 
> size.
> HDFS-2832 has introduced the concept of a DataNode as a collection of 
> storages i.e. the NameNode is aware of all the volumes (storage directories) 
> attached to a given DataNode. This makes it easy to split block reports from 
> the DN by sending one report per storage directory to mitigate the above 
> problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-01-16 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873518#comment-13873518
 ] 

Eric Sirianni commented on HDFS-5318:
-

Hi Arpit - would appreciate any feedback you have on the patch when you get a 
chance.
Also, as far as I can tell, the test failures is not related to the patch 
({{testBlocksAddedWhileStandbyIsDown()}} passes in my environment and 
{{TestBalancerWithNodeGroup}} seems flaky -- see HDFS-4376).

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5788) listLocatedStatus response can be very large

2014-01-16 Thread Nathan Roberts (JIRA)
Nathan Roberts created HDFS-5788:


 Summary: listLocatedStatus response can be very large
 Key: HDFS-5788
 URL: https://issues.apache.org/jira/browse/HDFS-5788
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0, 0.23.10, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently we limit the size of listStatus requests to a default of 1000 
entries. This works fine except in the case of listLocatedStatus where the 
location information can be quite large. As an example, a directory with 7000 
entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
over 1MB. This can chew up very large amounts of memory in the NN if lots of 
clients try to do this simultaneously.

Seems like it would be better if we also considered the amount of location 
information being returned when deciding how many files to return.

Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5777) Update LayoutVersion for the new editlog op OP_ADD_BLOCK

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873390#comment-13873390
 ] 

Hudson commented on HDFS-5777:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/])
HDFS-5777. Update LayoutVersion for the new editlog op OP_ADD_BLOCK. 
Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558675)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


> Update LayoutVersion for the new editlog op OP_ADD_BLOCK
> 
>
> Key: HDFS-5777
> URL: https://issues.apache.org/jira/browse/HDFS-5777
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 3.0.0
>
> Attachments: HDFS-5704-5777-branch2.patch, HDFS-5777.000.patch, 
> HDFS-5777.001.patch, HDFS-5777.002.patch, editsStored, editsStored, 
> editsStored
>
>
> HDFS-5704 adds a new editlog op OP_ADD_BLOCK. We need to update the 
> LayoutVersion for this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873391#comment-13873391
 ] 

Hudson commented on HDFS-5762:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/])
HDFS-5762. BlockReaderLocal does not return -1 on EOF when doing zero-length 
reads (cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558526)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java


> BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
> --
>
> Key: HDFS-5762
> URL: https://issues.apache.org/jira/browse/HDFS-5762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-5762.001.patch, HDFS-5762.002.patch
>
>
> Unlike the other block readers, BlockReaderLocal currently doesn't return -1 
> on EOF when doing zero-length reads.  This behavior, in turn, propagates to 
> the DFSInputStream.  BlockReaderLocal should do this, so that client can 
> determine whether the file is at an end by doing a zero-length read and 
> checking for -1.
> One place this shows up is in libhdfs, which does such a 0-length read to 
> determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5766) In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873394#comment-13873394
 ] 

Hudson commented on HDFS-5766:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/])
HDFS-5766. In DFSInputStream, do not add datanode to deadNodes after 
InvalidEncryptionKeyException in fetchBlockByteRange (Liang Xie via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558536)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> In DFSInputStream, do not add datanode to deadNodes after 
> InvalidEncryptionKeyException in fetchBlockByteRange
> --
>
> Key: HDFS-5766
> URL: https://issues.apache.org/jira/browse/HDFS-5766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Fix For: 2.4.0
>
> Attachments: HDFS-5766.txt
>
>
> Found this issue when i read fetchBlockByteRange code:
> If we hit InvalidEncryptionKeyException, current logic is:
> 1) reduce the retry count
> 2) clearDataEncryptionKey
> 3) addToDeadNodes
> 4) retry in another loop...
> If i am correct, we should treat InvalidEncryptionKeyException similar with 
> InvalidBlockToken branch, bypassing addToDeadNodes(), since it's a client 
> related, not caused by DN side:)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5768) Consolidate the serialization code in DelegationTokenSecretManager

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873388#comment-13873388
 ] 

Hudson commented on HDFS-5768:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/])
HDFS-5768. Consolidate the serialization code in DelegationTokenSecretManager. 
Contributed by Haohui Mai (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558598)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Consolidate the serialization code in DelegationTokenSecretManager
> --
>
> Key: HDFS-5768
> URL: https://issues.apache.org/jira/browse/HDFS-5768
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5768.000.patch, HDFS-5768.001.patch
>
>
> This jira proposes to extract a private class for the serialization code for 
> DelegationTokenSecretManager, so that it becomes easier to introduce new code 
> paths to serialize the same set of information using protobuf.
> This jira does not intend to introduce any functionality changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5775) Consolidate the code for serialization in CacheManager

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873387#comment-13873387
 ] 

Hudson commented on HDFS-5775:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1646 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1646/])
HDFS-5775. Consolidate the code for serialization in CacheManager. Contributed 
by Haohui Mai (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558599)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java


> Consolidate the code for serialization in CacheManager
> --
>
> Key: HDFS-5775
> URL: https://issues.apache.org/jira/browse/HDFS-5775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5775.000.patch
>
>
> This jira proposes to consolidate the code that is responsible for 
> serializing / deserializing cache manager state into a separate class, so 
> that it is easier to introduce new code path to serialize the data using 
> protobuf.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5768) Consolidate the serialization code in DelegationTokenSecretManager

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873366#comment-13873366
 ] 

Hudson commented on HDFS-5768:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/])
HDFS-5768. Consolidate the serialization code in DelegationTokenSecretManager. 
Contributed by Haohui Mai (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558598)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Consolidate the serialization code in DelegationTokenSecretManager
> --
>
> Key: HDFS-5768
> URL: https://issues.apache.org/jira/browse/HDFS-5768
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5768.000.patch, HDFS-5768.001.patch
>
>
> This jira proposes to extract a private class for the serialization code for 
> DelegationTokenSecretManager, so that it becomes easier to introduce new code 
> paths to serialize the same set of information using protobuf.
> This jira does not intend to introduce any functionality changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5766) In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873372#comment-13873372
 ] 

Hudson commented on HDFS-5766:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/])
HDFS-5766. In DFSInputStream, do not add datanode to deadNodes after 
InvalidEncryptionKeyException in fetchBlockByteRange (Liang Xie via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558536)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> In DFSInputStream, do not add datanode to deadNodes after 
> InvalidEncryptionKeyException in fetchBlockByteRange
> --
>
> Key: HDFS-5766
> URL: https://issues.apache.org/jira/browse/HDFS-5766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Fix For: 2.4.0
>
> Attachments: HDFS-5766.txt
>
>
> Found this issue when i read fetchBlockByteRange code:
> If we hit InvalidEncryptionKeyException, current logic is:
> 1) reduce the retry count
> 2) clearDataEncryptionKey
> 3) addToDeadNodes
> 4) retry in another loop...
> If i am correct, we should treat InvalidEncryptionKeyException similar with 
> InvalidBlockToken branch, bypassing addToDeadNodes(), since it's a client 
> related, not caused by DN side:)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5777) Update LayoutVersion for the new editlog op OP_ADD_BLOCK

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873368#comment-13873368
 ] 

Hudson commented on HDFS-5777:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/])
HDFS-5777. Update LayoutVersion for the new editlog op OP_ADD_BLOCK. 
Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558675)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


> Update LayoutVersion for the new editlog op OP_ADD_BLOCK
> 
>
> Key: HDFS-5777
> URL: https://issues.apache.org/jira/browse/HDFS-5777
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 3.0.0
>
> Attachments: HDFS-5704-5777-branch2.patch, HDFS-5777.000.patch, 
> HDFS-5777.001.patch, HDFS-5777.002.patch, editsStored, editsStored, 
> editsStored
>
>
> HDFS-5704 adds a new editlog op OP_ADD_BLOCK. We need to update the 
> LayoutVersion for this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873369#comment-13873369
 ] 

Hudson commented on HDFS-5762:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/])
HDFS-5762. BlockReaderLocal does not return -1 on EOF when doing zero-length 
reads (cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558526)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java


> BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
> --
>
> Key: HDFS-5762
> URL: https://issues.apache.org/jira/browse/HDFS-5762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-5762.001.patch, HDFS-5762.002.patch
>
>
> Unlike the other block readers, BlockReaderLocal currently doesn't return -1 
> on EOF when doing zero-length reads.  This behavior, in turn, propagates to 
> the DFSInputStream.  BlockReaderLocal should do this, so that client can 
> determine whether the file is at an end by doing a zero-length read and 
> checking for -1.
> One place this shows up is in libhdfs, which does such a 0-length read to 
> determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5775) Consolidate the code for serialization in CacheManager

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873365#comment-13873365
 ] 

Hudson commented on HDFS-5775:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1671 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1671/])
HDFS-5775. Consolidate the code for serialization in CacheManager. Contributed 
by Haohui Mai (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558599)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java


> Consolidate the code for serialization in CacheManager
> --
>
> Key: HDFS-5775
> URL: https://issues.apache.org/jira/browse/HDFS-5775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5775.000.patch
>
>
> This jira proposes to consolidate the code that is responsible for 
> serializing / deserializing cache manager state into a separate class, so 
> that it is easier to introduce new code path to serialize the data using 
> protobuf.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5189) Rename the "CorruptBlocks" metric to "CorruptReplicas"

2014-01-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HDFS-5189:
-

Assignee: (was: Harsh J)

> Rename the "CorruptBlocks" metric to "CorruptReplicas"
> --
>
> Key: HDFS-5189
> URL: https://issues.apache.org/jira/browse/HDFS-5189
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.1.0-beta
>Reporter: Harsh J
>Priority: Minor
>
> The NameNode increments a "CorruptBlocks" metric even if only one of the 
> block's
> replicas is reported corrupt (genuine checksum fail, or even if a
> replica has a bad genstamp). In cases where this is incremented, fsck
> still reports a healthy state.
> This is confusing to users and causes false alarm as they feel this is to be 
> monitored (instead of MissingBlocks). The metric is truly trying to report 
> only corrupt replicas, not whole blocks, and ought to be renamed.
> FWIW, the "dfsadmin -report" reports a proper string of "Blocks with corrupt 
> replicas:" when printing this count.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5766) In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873282#comment-13873282
 ] 

Hudson commented on HDFS-5766:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #454 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/454/])
HDFS-5766. In DFSInputStream, do not add datanode to deadNodes after 
InvalidEncryptionKeyException in fetchBlockByteRange (Liang Xie via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558536)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> In DFSInputStream, do not add datanode to deadNodes after 
> InvalidEncryptionKeyException in fetchBlockByteRange
> --
>
> Key: HDFS-5766
> URL: https://issues.apache.org/jira/browse/HDFS-5766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Fix For: 2.4.0
>
> Attachments: HDFS-5766.txt
>
>
> Found this issue when i read fetchBlockByteRange code:
> If we hit InvalidEncryptionKeyException, current logic is:
> 1) reduce the retry count
> 2) clearDataEncryptionKey
> 3) addToDeadNodes
> 4) retry in another loop...
> If i am correct, we should treat InvalidEncryptionKeyException similar with 
> InvalidBlockToken branch, bypassing addToDeadNodes(), since it's a client 
> related, not caused by DN side:)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5777) Update LayoutVersion for the new editlog op OP_ADD_BLOCK

2014-01-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873278#comment-13873278
 ] 

Hudson commented on HDFS-5777:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #454 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/454/])
HDFS-5777. Update LayoutVersion for the new editlog op OP_ADD_BLOCK. 
Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1558675)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml


> Update LayoutVersion for the new editlog op OP_ADD_BLOCK
> 
>
> Key: HDFS-5777
> URL: https://issues.apache.org/jira/browse/HDFS-5777
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 3.0.0
>
> Attachments: HDFS-5704-5777-branch2.patch, HDFS-5777.000.patch, 
> HDFS-5777.001.patch, HDFS-5777.002.patch, editsStored, editsStored, 
> editsStored
>
>
> HDFS-5704 adds a new editlog op OP_ADD_BLOCK. We need to update the 
> LayoutVersion for this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >