[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989348#comment-13989348
 ] 

Haohui Mai commented on HDFS-6293:
--

Just to recap:

# The requirement is to build an offline tool that can process PB-based fsimage.
# The namespace is mostly a hierarchal structure with snapshots. That is the 
exact reason why the PB-based fsimage have moved towards a record based storage.
# It is useful to recover the hierarchal structure in the namespace in some use 
cases. Based on design choices made in (2), any in-memory processing algorithm 
requires Theta(n) memory, where n is the number of inodes. It requires too much 
resources.
# Various solutions to leverage the resource on SNN have been proposed.

Here are my two cents:

# Though the current sets of tools do load the whole fsimage into memory and 
process it, there is no reason that any offline tool has to be implemented in 
that way. For example, building an index can solve the above use cases.
# The namespace is no longer a tree with snapshots. Forcing it into a 
hierarchal structure sometimes requires fitting square pegs through round pipes.
# Note that the goal of SBN / SNN are to improve the reliability of the system. 
The simpler the code is, the more likely the code can be throughly reasoned 
about and become more reliable. Personally I don't like any solutions that add 
complexity to SBN / SNN to solve the use case of offline image viewer. It 
doesn't seem right to solve an offline problem using an online machine that is 
accounted for the reliability of the system.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6293.000.patch, Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6293:
-

Attachment: HDFS-6293.000.patch

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6293.000.patch, Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6293:
-

Assignee: Haohui Mai
  Status: Patch Available  (was: Open)

To demonstrate my points, I'm attaching a patch which stores the current 
PB-based fsimage into a LevelDB, and perform lsr on top of the LevelDB.

The tool that converts the fsimage into LevelDB reads the whole {{INODE_DIR}} 
section into memory, then store the json representation of each inode into the 
key {{IN || parent_id || localName}}. That way all children for a particular 
inode are co-located thus it is efficient to run operations like lsr.

The conversion tool takes 16bytes * number of inodes to convert the fsimage. 
For a fsimage that have 400M inodes, the tool takes around 6.4G of memory, 
which could be run on a commodity machine. The lsr tool only requires O(1) 
memory.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-6293.000.patch, Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI

2014-05-05 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989379#comment-13989379
 ] 

Vinayakumar B commented on HDFS-6337:
-

patch looks good. +1

 Setfacl testcase is failing due to dash character in username in TestAclCLI
 ---

 Key: HDFS-6337
 URL: https://issues.apache.org/jira/browse/HDFS-6337
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-6337.patch


 TestHDFSCLI is failing due to a '-' in username.
 I have seen the similar fix done in HDFS-5821. So, same fix should be done 
 for setfacl case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989425#comment-13989425
 ] 

Hudson commented on HDFS-5168:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #558 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/558/])
HDFS-5168. Add cross node dependency support to BlockPlacementPolicy.  
Contributed by Nikola Vujic (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592179)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNSToSwitchMappingWithDependency.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMapping.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMappingWithDependency.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestScriptBasedMappingWithDependency.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


 BlockPlacementPolicy does not work for cross node group dependencies
 

 Key: HDFS-5168
 URL: https://issues.apache.org/jira/browse/HDFS-5168
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Nikola Vujic
Assignee: Nikola Vujic
Priority: Critical
 Fix For: 2.5.0

 Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, 
 HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, 
 HDFS-5168.patch, HDFS-5168.patch


 Block placement policies do not work for cross rack/node group dependencies. 
 In reality this is needed when compute servers and storage fall in two 
 independent fault domains, then both BlockPlacementPolicyDefault and 
 BlockPlacementPolicyWithNodeGroup are not able to provide proper block 
 placement.
 Let's suppose that we have Hadoop cluster with one rack with two servers, and 
 we run 2 VMs per server. Node group topology for this cluster would be:
  server1-vm1 - /d1/r1/n1
  server1-vm2 - /d1/r1/n1
  server2-vm1 - /d1/r1/n2
  server2-vm2 - /d1/r1/n2
 This is working fine as long as server and storage fall into the same fault 
 domain but if storage is in a different fault domain from the server, we will 
 not be able to handle that. For example, if storage of server1-vm1 is in the 
 same fault domain as storage of server2-vm1, then we must not place two 
 replicas on these two nodes although they are in different node groups.
 Two possible approaches:
 - One approach would be to define cross rack/node group dependencies and to 
 use them when excluding nodes from the search space. This looks as the 
 cleanest way to fix this as it requires minor changes in the 
 BlockPlacementPolicy classes.
 - Other approach would be to allow nodes to fall in more than one node group. 
 When we chose a node to hold a replica 

[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989422#comment-13989422
 ] 

Hudson commented on HDFS-6295:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #558 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/558/])
HDFS-6295. Add decommissioning state and node state filtering to dfsadmin. 
Contributed by Andrew Wang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592438)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml


 Add decommissioning state and node state filtering to dfsadmin
 

 Key: HDFS-6295
 URL: https://issues.apache.org/jira/browse/HDFS-6295
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.5.0

 Attachments: hdfs-6295-1.patch, hdfs-6295-2.patch, hdfs-6295-3.patch


 One of the few admin-friendly ways of viewing the list of decommissioning 
 nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes 
 on the cluster, which is prohibitive for large clusters, and also requires 
 manual parsing to look at the decom status. It'd be nicer if we could fetch 
 and display only decommissioning nodes (or just live and dead nodes for that 
 matter).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989427#comment-13989427
 ] 

Hadoop QA commented on HDFS-6293:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643332/HDFS-6293.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6812//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6812//console

This message is automatically generated.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-6293.000.patch, Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI

2014-05-05 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-6337:
--

   Resolution: Fixed
Fix Version/s: 2.5.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Vinay for the review!. I have just committed this to trunk and branch-2.

 Setfacl testcase is failing due to dash character in username in TestAclCLI
 ---

 Key: HDFS-6337
 URL: https://issues.apache.org/jira/browse/HDFS-6337
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6337.patch


 TestHDFSCLI is failing due to a '-' in username.
 I have seen the similar fix done in HDFS-5821. So, same fix should be done 
 for setfacl case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989452#comment-13989452
 ] 

Hudson commented on HDFS-5168:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1749 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1749/])
HDFS-5168. Add cross node dependency support to BlockPlacementPolicy.  
Contributed by Nikola Vujic (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592179)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNSToSwitchMappingWithDependency.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMapping.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMappingWithDependency.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestScriptBasedMappingWithDependency.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


 BlockPlacementPolicy does not work for cross node group dependencies
 

 Key: HDFS-5168
 URL: https://issues.apache.org/jira/browse/HDFS-5168
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Nikola Vujic
Assignee: Nikola Vujic
Priority: Critical
 Fix For: 2.5.0

 Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, 
 HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, 
 HDFS-5168.patch, HDFS-5168.patch


 Block placement policies do not work for cross rack/node group dependencies. 
 In reality this is needed when compute servers and storage fall in two 
 independent fault domains, then both BlockPlacementPolicyDefault and 
 BlockPlacementPolicyWithNodeGroup are not able to provide proper block 
 placement.
 Let's suppose that we have Hadoop cluster with one rack with two servers, and 
 we run 2 VMs per server. Node group topology for this cluster would be:
  server1-vm1 - /d1/r1/n1
  server1-vm2 - /d1/r1/n1
  server2-vm1 - /d1/r1/n2
  server2-vm2 - /d1/r1/n2
 This is working fine as long as server and storage fall into the same fault 
 domain but if storage is in a different fault domain from the server, we will 
 not be able to handle that. For example, if storage of server1-vm1 is in the 
 same fault domain as storage of server2-vm1, then we must not place two 
 replicas on these two nodes although they are in different node groups.
 Two possible approaches:
 - One approach would be to define cross rack/node group dependencies and to 
 use them when excluding nodes from the search space. This looks as the 
 cleanest way to fix this as it requires minor changes in the 
 BlockPlacementPolicy classes.
 - Other approach would be to allow nodes to fall in more than one node group. 
 When we chose a node to hold a replica 

[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989449#comment-13989449
 ] 

Hudson commented on HDFS-6295:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1749 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1749/])
HDFS-6295. Add decommissioning state and node state filtering to dfsadmin. 
Contributed by Andrew Wang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592438)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml


 Add decommissioning state and node state filtering to dfsadmin
 

 Key: HDFS-6295
 URL: https://issues.apache.org/jira/browse/HDFS-6295
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.5.0

 Attachments: hdfs-6295-1.patch, hdfs-6295-2.patch, hdfs-6295-3.patch


 One of the few admin-friendly ways of viewing the list of decommissioning 
 nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes 
 on the cluster, which is prohibitive for large clusters, and also requires 
 manual parsing to look at the decom status. It'd be nicer if we could fetch 
 and display only decommissioning nodes (or just live and dead nodes for that 
 matter).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6319) Various syntax and style cleanups

2014-05-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6319:
---

Attachment: HDFS-6319.7.patch

Reattach the patch file to try to get Jenkins to run tests.

 Various syntax and style cleanups
 -

 Key: HDFS-6319
 URL: https://issues.apache.org/jira/browse/HDFS-6319
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6319.1.patch, HDFS-6319.2.patch, HDFS-6319.3.patch, 
 HDFS-6319.4.patch, HDFS-6319.6.patch, HDFS-6319.7.patch


 Fix various style issues like if(, while(, [i.e. lack of a space after the 
 keyword],
 Extra whitespace and newlines
 if (...) return ... [lack of {}'s]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6319) Various syntax and style cleanups

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989478#comment-13989478
 ] 

Hadoop QA commented on HDFS-6319:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643357/HDFS-6319.7.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6814//console

This message is automatically generated.

 Various syntax and style cleanups
 -

 Key: HDFS-6319
 URL: https://issues.apache.org/jira/browse/HDFS-6319
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6319.1.patch, HDFS-6319.2.patch, HDFS-6319.3.patch, 
 HDFS-6319.4.patch, HDFS-6319.6.patch, HDFS-6319.7.patch


 Fix various style issues like if(, while(, [i.e. lack of a space after the 
 keyword],
 Extra whitespace and newlines
 if (...) return ... [lack of {}'s]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-05 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989495#comment-13989495
 ] 

Binglin Chang commented on HDFS-6250:
-

Thanks for the analysis and patch [~airbots]. The fix makes sense,  here are 
some additional concerns:

bq. HDFS creates a /system/balancer.id file (30B) to track the balancer
Looks like the file contains hostname, whose size is not fixed, I see you 
increased block size and capacity to minimize the impact of the file, but it 
seems the risk is still there.

testBalancerWithRackLocality tests balancer do not perform cross rack block 
movements in test scenario, here are the related balancer logs:

{code}
014-04-15 18:29:48,649 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:48,650 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=30.0], 
Source[127.0.0.1:46174, utilization=30.0]]
2014-04-15 18:29:48,650 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:48,650 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=0.0]]

2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=30.168], 
Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, 
utilization=1.8333]]

2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=28.5], 
Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=5.0]]

2014-04-15 18:29:57,898 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:57,898 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:46174, utilization=30.332], 
Source[127.0.0.1:54333, utilization=25.332]]
2014-04-15 18:29:57,899 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:57,899 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, 
utilization=7.667]]

2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=22.668], 
Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=10.5]]

2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 above-average: [Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 below-average: [BalancerDatanode[127.0.0.1:54333, 
utilization=19.832], BalancerDatanode[127.0.0.1:48293, 
utilization=12.0]]
2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 underutilized: []
{code}

I guess the test intended to let /rack0/NODEGROUP0/dn above-average(=30%) but 
not over-utilized(30%, consider avg utilization=20%), so blocks on rack0 never 
move to rack1, but another balancer.id file may break the assumption. So there 
are some problem inherently in the test, not just race condition or timeout 
stuff. We may need to change the test(e.g. file size, utilize rate, validate 
method) to prevent those corner cases.


 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: 

[jira] [Updated] (HDFS-6319) Various syntax and style cleanups

2014-05-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6319:
---

Attachment: HDFS-6319.8.patch

Rebased.

 Various syntax and style cleanups
 -

 Key: HDFS-6319
 URL: https://issues.apache.org/jira/browse/HDFS-6319
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6319.1.patch, HDFS-6319.2.patch, HDFS-6319.3.patch, 
 HDFS-6319.4.patch, HDFS-6319.6.patch, HDFS-6319.7.patch, HDFS-6319.8.patch


 Fix various style issues like if(, while(, [i.e. lack of a space after the 
 keyword],
 Extra whitespace and newlines
 if (...) return ... [lack of {}'s]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6301) NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit log.

2014-05-05 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989520#comment-13989520
 ] 

Uma Maheswara Rao G commented on HDFS-6301:
---

+1 on the latest patch. Thanks for the reviews Andrew and Charles. I will file 
separate JIRA for OP_SET_XATTRS optimization above mentioned.

 NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit 
 log.
 

 Key: HDFS-6301
 URL: https://issues.apache.org/jira/browse/HDFS-6301
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6301.1.patch, HDFS-6301.patch


 Store XAttrs in fsimage so that XAttrs are retained across NameNode restarts.
 Implement a new edit log opcode, {{OP_SET_XATTRS}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6301) NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit log.

2014-05-05 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G resolved HDFS-6301.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

I have committed the patch to branch!

 NameNode: persist XAttrs in fsimage and record XAttrs modifications to edit 
 log.
 

 Key: HDFS-6301
 URL: https://issues.apache.org/jira/browse/HDFS-6301
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6301.1.patch, HDFS-6301.patch


 Store XAttrs in fsimage so that XAttrs are retained across NameNode restarts.
 Implement a new edit log opcode, {{OP_SET_XATTRS}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6340) DN can't finalize upgarde

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6340:
-

Priority: Blocker  (was: Major)
Target Version/s: 2.4.1

 DN can't finalize upgarde
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Priority: Blocker
 Attachments: HDFS-6340-branch-2.4.0.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989540#comment-13989540
 ] 

Kihwal Lee commented on HDFS-6293:
--

bq. To demonstrate my points, I'm attaching a patch which stores the current 
PB-based fsimage into a LevelDB, and perform lsr on top of the LevelDB.

That was the first thing I thought about doing, but the processing time matters 
too. 

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-6293.000.patch, Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6295) Add decommissioning state and node state filtering to dfsadmin

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989569#comment-13989569
 ] 

Hudson commented on HDFS-6295:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1775 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1775/])
HDFS-6295. Add decommissioning state and node state filtering to dfsadmin. 
Contributed by Andrew Wang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592438)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml


 Add decommissioning state and node state filtering to dfsadmin
 

 Key: HDFS-6295
 URL: https://issues.apache.org/jira/browse/HDFS-6295
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.5.0

 Attachments: hdfs-6295-1.patch, hdfs-6295-2.patch, hdfs-6295-3.patch


 One of the few admin-friendly ways of viewing the list of decommissioning 
 nodes is via hdfs dfsadmin -report. However, this lists *all* the datanodes 
 on the cluster, which is prohibitive for large clusters, and also requires 
 manual parsing to look at the decom status. It'd be nicer if we could fetch 
 and display only decommissioning nodes (or just live and dead nodes for that 
 matter).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989572#comment-13989572
 ] 

Hudson commented on HDFS-5168:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1775 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1775/])
HDFS-5168. Add cross node dependency support to BlockPlacementPolicy.  
Contributed by Nikola Vujic (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592179)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNSToSwitchMappingWithDependency.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMapping.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/ScriptBasedMappingWithDependency.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestScriptBasedMappingWithDependency.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/Host2NodesMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


 BlockPlacementPolicy does not work for cross node group dependencies
 

 Key: HDFS-5168
 URL: https://issues.apache.org/jira/browse/HDFS-5168
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Nikola Vujic
Assignee: Nikola Vujic
Priority: Critical
 Fix For: 2.5.0

 Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, 
 HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, 
 HDFS-5168.patch, HDFS-5168.patch


 Block placement policies do not work for cross rack/node group dependencies. 
 In reality this is needed when compute servers and storage fall in two 
 independent fault domains, then both BlockPlacementPolicyDefault and 
 BlockPlacementPolicyWithNodeGroup are not able to provide proper block 
 placement.
 Let's suppose that we have Hadoop cluster with one rack with two servers, and 
 we run 2 VMs per server. Node group topology for this cluster would be:
  server1-vm1 - /d1/r1/n1
  server1-vm2 - /d1/r1/n1
  server2-vm1 - /d1/r1/n2
  server2-vm2 - /d1/r1/n2
 This is working fine as long as server and storage fall into the same fault 
 domain but if storage is in a different fault domain from the server, we will 
 not be able to handle that. For example, if storage of server1-vm1 is in the 
 same fault domain as storage of server2-vm1, then we must not place two 
 replicas on these two nodes although they are in different node groups.
 Two possible approaches:
 - One approach would be to define cross rack/node group dependencies and to 
 use them when excluding nodes from the search space. This looks as the 
 cleanest way to fix this as it requires minor changes in the 
 BlockPlacementPolicy classes.
 - Other approach would be to allow nodes to fall in more than one node group. 
 When we chose a node to hold 

[jira] [Commented] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989574#comment-13989574
 ] 

Hudson commented on HDFS-6337:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1775 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1775/])
HDFS-6337. Setfacl testcase is failing due to dash character in username in 
TestAclCLI. Contributed by Uma Maheswara Rao G. (umamahesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592489)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testAclCLI.xml


 Setfacl testcase is failing due to dash character in username in TestAclCLI
 ---

 Key: HDFS-6337
 URL: https://issues.apache.org/jira/browse/HDFS-6337
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6337.patch


 TestHDFSCLI is failing due to a '-' in username.
 I have seen the similar fix done in HDFS-5821. So, same fix should be done 
 for setfacl case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6339) DN, SNN JN can't rollback data

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989578#comment-13989578
 ] 

Kihwal Lee commented on HDFS-6339:
--

This is expected.

Upgrade/rollback with HA was not supported until 2.4. So If you are rolling 
back from 2.4 HA to a previous version, some manual steps are needed.  In this 
case, rollback needs to be done with HA off. It means that the shared edits 
need to be copied to non-HA name.dir or edits.dir depending on your config. If 
the HA NN was configured to also store edits locally, it is a bit easier.  
After successfully rolling back, HA can be re-enabled by initializing shared 
edits dir and bootstrapping standby.  2NN does not have to persist any state, 
so you can safely delete the temporary files.

bq. I fixed this by deleting the JN data directory.
I assume the NN had all edits locally. Otherwise there ca be data loss.  Other 
than this, your procedure seems okay.

In the future, please use mailing lists for the inquiries of this kind.


 DN, SNN  JN can't rollback data
 

 Key: HDFS-6339
 URL: https://issues.apache.org/jira/browse/HDFS-6339
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 2.2.0
Reporter: Rahul Singhal

 I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't 
 perform rollback.
 I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA 
 enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to 
 non-HA and started it on 2.2.0. I started NN  DN with the '-rollback' 
 startup option. (There is no explicit startup option for SNN  JN like NN  
 DN). Only NN was able to rollback correctly.
 My fixes:
 I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526.
 I fixed the SNN rollback problem by starting it with '-format' option.
 I then proceeded to converting the non-HA cluster to a HA cluster. The first 
 step after configuration change was to start the JNs. But they also couldn't 
 rollback.
 My fix:
 I fixed this by deleting the JN data directory. (deleting the 'current' 
 directory and renaming 'previous' to 'current' didn't fix the rollback)
 My purpose for filing this bug is to:
 1. Ask if these problems are known and intended to be fixed in any future 
 releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 
 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0.
 2. Confirm that my fixes are correct. If not, please help me with an 
 appropriate fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6339) DN, SNN JN can't rollback data

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989578#comment-13989578
 ] 

Kihwal Lee edited comment on HDFS-6339 at 5/5/14 2:54 PM:
--

This is expected.

Upgrade/rollback with HA was not supported until 2.4. So If you are rolling 
back from 2.4 HA to a previous version, some manual steps are needed.  In this 
case, rollback needs to be done with HA off. It means that the shared edits 
need to be copied to non-HA name.dir or edits.dir depending on your config. If 
the HA NN was configured to also store edits locally, it is a bit easier.  
After successfully rolling back, HA can be re-enabled by initializing shared 
edits dir and bootstrapping standby.  2NN does not have to persist any state, 
so you can safely delete the temporary files.

bq. I fixed this by deleting the JN data directory.
I assume the NN had all edits locally. Otherwise there can be data loss.  Other 
than this, your procedure seems okay.

In the future, please use mailing lists for the inquiries of this kind.



was (Author: kihwal):
This is expected.

Upgrade/rollback with HA was not supported until 2.4. So If you are rolling 
back from 2.4 HA to a previous version, some manual steps are needed.  In this 
case, rollback needs to be done with HA off. It means that the shared edits 
need to be copied to non-HA name.dir or edits.dir depending on your config. If 
the HA NN was configured to also store edits locally, it is a bit easier.  
After successfully rolling back, HA can be re-enabled by initializing shared 
edits dir and bootstrapping standby.  2NN does not have to persist any state, 
so you can safely delete the temporary files.

bq. I fixed this by deleting the JN data directory.
I assume the NN had all edits locally. Otherwise there ca be data loss.  Other 
than this, your procedure seems okay.

In the future, please use mailing lists for the inquiries of this kind.


 DN, SNN  JN can't rollback data
 

 Key: HDFS-6339
 URL: https://issues.apache.org/jira/browse/HDFS-6339
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 2.2.0
Reporter: Rahul Singhal

 I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't 
 perform rollback.
 I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA 
 enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to 
 non-HA and started it on 2.2.0. I started NN  DN with the '-rollback' 
 startup option. (There is no explicit startup option for SNN  JN like NN  
 DN). Only NN was able to rollback correctly.
 My fixes:
 I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526.
 I fixed the SNN rollback problem by starting it with '-format' option.
 I then proceeded to converting the non-HA cluster to a HA cluster. The first 
 step after configuration change was to start the JNs. But they also couldn't 
 rollback.
 My fix:
 I fixed this by deleting the JN data directory. (deleting the 'current' 
 directory and renaming 'previous' to 'current' didn't fix the rollback)
 My purpose for filing this bug is to:
 1. Ask if these problems are known and intended to be fixed in any future 
 releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 
 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0.
 2. Confirm that my fixes are correct. If not, please help me with an 
 appropriate fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6339) DN, SNN JN can't rollback data

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-6339.
--

Resolution: Done

 DN, SNN  JN can't rollback data
 

 Key: HDFS-6339
 URL: https://issues.apache.org/jira/browse/HDFS-6339
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 2.2.0
Reporter: Rahul Singhal

 I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't 
 perform rollback.
 I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA 
 enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to 
 non-HA and started it on 2.2.0. I started NN  DN with the '-rollback' 
 startup option. (There is no explicit startup option for SNN  JN like NN  
 DN). Only NN was able to rollback correctly.
 My fixes:
 I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526.
 I fixed the SNN rollback problem by starting it with '-format' option.
 I then proceeded to converting the non-HA cluster to a HA cluster. The first 
 step after configuration change was to start the JNs. But they also couldn't 
 rollback.
 My fix:
 I fixed this by deleting the JN data directory. (deleting the 'current' 
 directory and renaming 'previous' to 'current' didn't fix the rollback)
 My purpose for filing this bug is to:
 1. Ask if these problems are known and intended to be fixed in any future 
 releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 
 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0.
 2. Confirm that my fixes are correct. If not, please help me with an 
 appropriate fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6335) TestOfflineEditsViewer for XAttr

2014-05-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-6335:
-

Attachment: HDFS-6335.1.patch

Thanks Charles for your review. I just update the patch.

 TestOfflineEditsViewer for XAttr
 

 Key: HDFS-6335
 URL: https://issues.apache.org/jira/browse/HDFS-6335
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6335.1.patch, HDFS-6335.patch, editsStored


 TestOfflineEditsViewer for XAttr, and also need update for editsStored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Chen He (JIRA)
Chen He created HDFS-6342:
-

 Summary: TestBalancerWithNodeGroup.testBalancerWithRackLocality 
may fail if balancer.id file is huge
 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He


The testBalancerWithRackLocality mehtod is to test balancer moving data blocks 
with rack locality consideration. 

It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node 
blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B 
and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 
data blocks with replication factor 2. Then, a node datanode is created (in 
rack1nodeGroup2) and balancer starts to balancing the cluster.

It expects there is only data blocks moving within rack1. After balancer is 
done, it assumes the data size on both racks is the same. It will break
if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-05 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989598#comment-13989598
 ] 

Chen He commented on HDFS-6250:
---

Thank you for your comments, [~decster].  

I agree with you that the balancer.id file can bring more problems. There are 
two ways to reduce the side-effect of the balancer.id file in this test method. 
1) increasing the block size to reduce the impact of balancer.id file( this is 
what I did);
2) introduce two new nodes, one in rack0 and one in rack1. 

The failure in this JIRA is because of the balancer.id file's block (blk_181) 
that should be deleted. We have to wait until that block is deleted. I  created 
a sub-task HDFS-6342 to redesign this test method. 



 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6338) Add a RPC method to allow administrator to delete the file lease.

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989603#comment-13989603
 ] 

Kihwal Lee commented on HDFS-6338:
--

I think you can achieve what you want with recoverLease().  Leases cannot be 
simply deleted without actually finalizing the last block replicas and closing 
the file.  NN does it for you, but it usually takes several seconds for 
recovering the last block.

If you don't care about the old content and want to create a new file with the 
same name, simply delete the old file. The lease will be deleted along with the 
file.

 Add a RPC method to allow administrator to delete the file lease.
 -

 Key: HDFS-6338
 URL: https://issues.apache.org/jira/browse/HDFS-6338
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor

 we have to wait file lease expire after unexpected interrupt during HDFS 
 writing. so I want to add a RPC method to allow administrator delete the file 
 lease.
 Please leave comments here, I am workong on the patch now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989604#comment-13989604
 ] 

Chen He commented on HDFS-6342:
---

If we create two new nodes: one in rack0 and one in rack1, it can avoid 
inter-rack data transferring even the balancer.id file is huge.

 TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if 
 balancer.id file is huge
 ---

 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He

 The testBalancerWithRackLocality mehtod is to test balancer moving data 
 blocks with rack locality consideration. 
 It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother 
 node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size 
 is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It 
 create 180 data blocks with replication factor 2. Then, a node datanode is 
 created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
 It expects there is only data blocks moving within rack1. After balancer is 
 done, it assumes the data size on both racks is the same. It will break
 if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6335) TestOfflineEditsViewer for XAttr

2014-05-05 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989610#comment-13989610
 ] 

Uma Maheswara Rao G commented on HDFS-6335:
---

+1 on the latest patch. Thanks for the review Charles.

I have ran the test in my env and they are passed:
{noformat}
---
 T E S T S
---
Running org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.038 sec - in
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

{noformat}

 TestOfflineEditsViewer for XAttr
 

 Key: HDFS-6335
 URL: https://issues.apache.org/jira/browse/HDFS-6335
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6335.1.patch, HDFS-6335.patch, editsStored


 TestOfflineEditsViewer for XAttr, and also need update for editsStored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6335) TestOfflineEditsViewer for XAttr

2014-05-05 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G resolved HDFS-6335.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

I have just committed this to branch

 TestOfflineEditsViewer for XAttr
 

 Key: HDFS-6335
 URL: https://issues.apache.org/jira/browse/HDFS-6335
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6335.1.patch, HDFS-6335.patch, editsStored


 TestOfflineEditsViewer for XAttr, and also need update for editsStored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6298) XML based End-to-End test for getfattr and setfattr commands

2014-05-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-6298:
-

Attachment: HDFS-6298.2.patch

Thanks Uma for review. The new patch includes update for your comments.

 XML based End-to-End test for getfattr and setfattr commands
 

 Key: HDFS-6298
 URL: https://issues.apache.org/jira/browse/HDFS-6298
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6298.1.patch, HDFS-6298.2.patch, HDFS-6298.patch


 This JIRA to add test cases with CLI



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6298) XML based End-to-End test for getfattr and setfattr commands

2014-05-05 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989616#comment-13989616
 ] 

Yi Liu commented on HDFS-6298:
--

New end to end test is added:
setfattr : Add an xattr which has wrong prefix

 XML based End-to-End test for getfattr and setfattr commands
 

 Key: HDFS-6298
 URL: https://issues.apache.org/jira/browse/HDFS-6298
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6298.1.patch, HDFS-6298.2.patch, HDFS-6298.patch


 This JIRA to add test cases with CLI



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6298) XML based End-to-End test for getfattr and setfattr commands

2014-05-05 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989662#comment-13989662
 ] 

Uma Maheswara Rao G commented on HDFS-6298:
---

Thanks a lot Yi for the update on patch. Actually my intention on test cases is 
for dir/file permissions based test cases. Ex: If a user does not have 
permission for a file, then he should not be able to set the xattrs for that 
file. 
Also I suggest to add tests for each namespace specified by user API. We allow 
only 2 namespaces from user API.

 XML based End-to-End test for getfattr and setfattr commands
 

 Key: HDFS-6298
 URL: https://issues.apache.org/jira/browse/HDFS-6298
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6298.1.patch, HDFS-6298.2.patch, HDFS-6298.patch


 This JIRA to add test cases with CLI



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5522) Datanode disk error check may be incorrectly skipped

2014-05-05 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-5522:
-

Attachment: HDFS-5522.patch

In this patch, I have created new thread which will check for disk errors when 
there is request fro disk error check every 5 seconds.

 Datanode disk error check may be incorrectly skipped
 

 Key: HDFS-5522
 URL: https://issues.apache.org/jira/browse/HDFS-5522
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
 Attachments: HDFS-5522.patch


 After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when 
 network errors occur during processing data node requests.  This appears to 
 create problems when a disk is having problems, but not failing I/O soon. 
 If I/O hangs for a long time, network read/write may timeout first and the 
 peer may close the connection. Although the error was caused by a faulty 
 local disk, disk check is not being carried out in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5522) Datanode disk error check may be incorrectly skipped

2014-05-05 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-5522:
-

Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

 Datanode disk error check may be incorrectly skipped
 

 Key: HDFS-5522
 URL: https://issues.apache.org/jira/browse/HDFS-5522
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0, 0.23.9
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
 Attachments: HDFS-5522.patch


 After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when 
 network errors occur during processing data node requests.  This appears to 
 create problems when a disk is having problems, but not failing I/O soon. 
 If I/O hangs for a long time, network read/write may timeout first and the 
 peer may close the connection. Although the error was caused by a faulty 
 local disk, disk check is not being carried out in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6334) Client failover proxy provider for IP failover based NN HA

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6334:
-

Attachment: HDFS-6334.patch

 Client failover proxy provider for IP failover based NN HA
 --

 Key: HDFS-6334
 URL: https://issues.apache.org/jira/browse/HDFS-6334
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-6334.patch


 With RPCv9 and improvements in the SPNEGO auth handling, it is possible to 
 set up a pair of HA namenodes utilizing IP failover as client-request fencing 
 mechanism.
 This jira will make it possible for HA to be configured without requiring use 
 of logical URI and provide a simple IP failover proxy provider.  The change 
 will allow any old implementation of {{FailoverProxyProvider}} to continue to 
 work.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6334) Client failover proxy provider for IP failover based NN HA

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6334:
-

Status: Patch Available  (was: Open)

 Client failover proxy provider for IP failover based NN HA
 --

 Key: HDFS-6334
 URL: https://issues.apache.org/jira/browse/HDFS-6334
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-6334.patch


 With RPCv9 and improvements in the SPNEGO auth handling, it is possible to 
 set up a pair of HA namenodes utilizing IP failover as client-request fencing 
 mechanism.
 This jira will make it possible for HA to be configured without requiring use 
 of logical URI and provide a simple IP failover proxy provider.  The change 
 will allow any old implementation of {{FailoverProxyProvider}} to continue to 
 work.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6342:
--

Target Version/s: 3.0.0
  Status: Patch Available  (was: Open)

 TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if 
 balancer.id file is huge
 ---

 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
 Attachments: HDFS-6342.patch


 The testBalancerWithRackLocality mehtod is to test balancer moving data 
 blocks with rack locality consideration. 
 It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother 
 node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size 
 is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It 
 create 180 data blocks with replication factor 2. Then, a node datanode is 
 created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
 It expects there is only data blocks moving within rack1. After balancer is 
 done, it assumes the data size on both racks is the same. It will break
 if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6342:
--

Attachment: HDFS-6342.patch

test run 40 times and no error report.

 TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if 
 balancer.id file is huge
 ---

 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
 Attachments: HDFS-6342.patch


 The testBalancerWithRackLocality mehtod is to test balancer moving data 
 blocks with rack locality consideration. 
 It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother 
 node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size 
 is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It 
 create 180 data blocks with replication factor 2. Then, a node datanode is 
 created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
 It expects there is only data blocks moving within rack1. After balancer is 
 done, it assumes the data size on both racks is the same. It will break
 if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Attachment: HDFS-6230-UpgradeInProgress.jpg
HDFS-6230-NoUpgradesInProgress.png

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Patch Available  (was: Open)

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled

2014-05-05 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6336:


Attachment: HDFS-6336.001.patch

 Cannot download file via webhdfs when wildcard is enabled
 -

 Key: HDFS-6336
 URL: https://issues.apache.org/jira/browse/HDFS-6336
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, 
 HDFS-6336.001.patch


 With wildcard is enabled, issuing a webhdfs command like
 {code}
 http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN
 {code}
 would give
 {code}
 http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0
 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call
  From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled

2014-05-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989702#comment-13989702
 ] 

Yongjun Zhang commented on HDFS-6336:
-

Somehow the test was still not triggered, uploaded the same patch to try again.

 Cannot download file via webhdfs when wildcard is enabled
 -

 Key: HDFS-6336
 URL: https://issues.apache.org/jira/browse/HDFS-6336
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, 
 HDFS-6336.001.patch


 With wildcard is enabled, issuing a webhdfs command like
 {code}
 http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN
 {code}
 would give
 {code}
 http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0
 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call
  From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6340) DN can't finalize upgarde

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989705#comment-13989705
 ] 

Kihwal Lee commented on HDFS-6340:
--

Good catch!  I think we can start with {{false}} as the initial value and use 
the simple assignment instead of AND operation.  After all, the last result 
must be up-to-date.  But there is another problem.

{{nn.isStandbyState()}} is not protected from HA state transitions. We could 
create a {{FSNamesystem}} method that acquires its read lock and checks the 
datanode storage staleness (calling down to {{BlockManager}}) and the HA state. 
This is preferred since we want to avoid making {{BlockManager}} lock 
{{FSNameSystem}}.  If we do this, we don't have to check the individual results 
from {{processReport()}}.

 DN can't finalize upgarde
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Priority: Blocker
 Attachments: HDFS-6340-branch-2.4.0.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6339) DN, SNN JN can't rollback data

2014-05-05 Thread Rahul Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989711#comment-13989711
 ] 

Rahul Singhal commented on HDFS-6339:
-

Thanks a lot for your reply [~kihwal].

I was testing both cases/paths:
(non-HA, 2.2.0) - (non-HA, 2.4.0) - (non-HA 2.2.0)
(HA, 2.2.0) - (HA, 2.4.0) - (non-HA, 2.2.0) - (HA, 2.2.0)

The issue with 2NN was noticed in cases 1. I guess I was mainly confused by the 
fact that start-dfs.sh does not fromat the 2NN during rollback.


Thanks for confirming my procedure. And I will use the mailing list for future 
questions but since you have the context here, I was hoping you could answer 
one more questions. In what cases, will NN not have all edits locally? Should 
they be available at edits.dir?

 DN, SNN  JN can't rollback data
 

 Key: HDFS-6339
 URL: https://issues.apache.org/jira/browse/HDFS-6339
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 2.2.0
Reporter: Rahul Singhal

 I tried rollback from 2.4.0 to 2.2.0 and noticed that DN, SNN and JN couldn't 
 perform rollback.
 I started with a (NN) HA cluster on 2.2.0 and upgraded it to 2.4.0 with HA 
 enabled. Then attempted a rollback to 2.2.0. I first configured my cluster to 
 non-HA and started it on 2.2.0. I started NN  DN with the '-rollback' 
 startup option. (There is no explicit startup option for SNN  JN like NN  
 DN). Only NN was able to rollback correctly.
 My fixes:
 I fixed the DN rollback problem by cherry-picking the fix from HDFS-5526.
 I fixed the SNN rollback problem by starting it with '-format' option.
 I then proceeded to converting the non-HA cluster to a HA cluster. The first 
 step after configuration change was to start the JNs. But they also couldn't 
 rollback.
 My fix:
 I fixed this by deleting the JN data directory. (deleting the 'current' 
 directory and renaming 'previous' to 'current' didn't fix the rollback)
 My purpose for filing this bug is to:
 1. Ask if these problems are known and intended to be fixed in any future 
 releases. If yes, which one? DN rollback was fixed in 2.3.0 but what about 
 2.2.x series? JN rollback seems (not confirmed) to have been fixed in 2.4.0.
 2. Confirm that my fixes are correct. If not, please help me with an 
 appropriate fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6313:
-

Priority: Blocker  (was: Major)
Target Version/s: 2.4.1  (was: 2.5.0)

 WebHdfs may use the wrong NN when configured for multiple HA NNs
 

 Key: HDFS-6313
 URL: https://issues.apache.org/jira/browse/HDFS-6313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.4.0
Reporter: Daryn Sharp
Priority: Blocker

 WebHdfs resolveNNAddr will return a union of addresses for all HA configured 
 NNs.  The client may access the wrong NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6330) Move mkdir() to FSNamesystem

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989750#comment-13989750
 ] 

Hadoop QA commented on HDFS-6330:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643151/HDFS-6330.000.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6818//console

This message is automatically generated.

 Move mkdir() to FSNamesystem
 

 Key: HDFS-6330
 URL: https://issues.apache.org/jira/browse/HDFS-6330
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6330.000.patch


 Currently mkdir() automatically creates all ancestors for a directory. This 
 is implemented in FSDirectory, by calling unprotectedMkdir() along the path. 
 This jira proposes to move the function to FSNamesystem to simplify the 
 primitive that FSDirectory needs to provide.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989761#comment-13989761
 ] 

Kihwal Lee commented on HDFS-6313:
--

In 2.4.0 it is {{DFSUtil.resolveWebHdfsUri()}}. In trunk, it is 
{{WebHdfsFileSystem#resolveNNAddr()}.  They all obtain NN addresses by calling 
{{DFSUtil.getAddresses()}}, which gets all NN http addresses from all known 
name services. If multiple name services are configured, {{WebHdfsFileSystem}} 
can use a wrong NN.

 WebHdfs may use the wrong NN when configured for multiple HA NNs
 

 Key: HDFS-6313
 URL: https://issues.apache.org/jira/browse/HDFS-6313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.4.0
Reporter: Daryn Sharp
Priority: Blocker

 WebHdfs resolveNNAddr will return a union of addresses for all HA configured 
 NNs.  The client may access the wrong NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times

2014-05-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989791#comment-13989791
 ] 

Colin Patrick McCabe commented on HDFS-6287:


bq. Hi, Colin. Thanks for posting this. Did you find that you needed to use SSE 
to get the addition fast enough so that the benchmark highlights read 
throughput instead of sum computation? IOW, could we potentially simplify this 
patch to not use SSE at all and still have a valid benchmark?

Without that optimization, the benchmark quickly becomes CPU-bound and you 
don't get true numbers for ZCR and other fast read methods.  I just benchmarked 
1.5 GB/s for the un-optimized version versus 5.7 GB/s for the optimized.

bq. I think it would be helpful to add a comment with a high-level summary of 
what vecsum does, maybe right before the main.

Added

bq. I have one minor comment on the code itself so far. I think you can remove 
the hdfsFreeBuilder call. hdfsBuilderConnect always frees the builder, whether 
it succeeds or fails. The only time you would need to call hdfsFreeBuilder 
directly is if you allocated a builder but then never attempted to connect with 
it. I don't see any way for that to happen in the libhdfs_data_create code.

Yeah, that is deadcode.  Let me remove that

 Add vecsum test of libhdfs read access times
 

 Key: HDFS-6287
 URL: https://issues.apache.org/jira/browse/HDFS-6287
 Project: Hadoop HDFS
  Issue Type: Test
  Components: libhdfs, test
Affects Versions: 2.5.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, 
 HDFS-6287.003.patch, HDFS-6287.004.patch, HDFS-6287.005.patch


 Add vecsum, a benchmark that tests libhdfs access times.  This includes 
 short-circuit, zero-copy, and standard libhdfs access modes.  It also has a 
 local filesystem mode for comparison.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6287) Add vecsum test of libhdfs read access times

2014-05-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6287:
---

Attachment: HDFS-6287.005.patch

 Add vecsum test of libhdfs read access times
 

 Key: HDFS-6287
 URL: https://issues.apache.org/jira/browse/HDFS-6287
 Project: Hadoop HDFS
  Issue Type: Test
  Components: libhdfs, test
Affects Versions: 2.5.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, 
 HDFS-6287.003.patch, HDFS-6287.004.patch, HDFS-6287.005.patch


 Add vecsum, a benchmark that tests libhdfs access times.  This includes 
 short-circuit, zero-copy, and standard libhdfs access modes.  It also has a 
 local filesystem mode for comparison.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6326) WebHdfs ACL compatibility is broken

2014-05-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6326:


Attachment: HDFS-6326.2.patch

Here is patch v2, changing the exception handling in ls.  I took a very 
defensive approach and just caught {{Exception}}.  This fixes the immediate 
problem and also anticipates any future problems related to custom 
{{FileSystem}} implementations.  Of course, it's not generally a good idea to 
do a blanket catch of {{Exception}}.  In this case though, the worst thing that 
can happen is that we skip displaying the '+', which I think is preferable over 
causing the ls command to fail if there are other unanticipated failures 
related to {{getAclStatus}}.

In addition to running the ACL-related unit tests, I also did some manual 
testing.  I tested ls using URLs with the webhdfs scheme against a 2.3.0 
cluster, and it worked.  I also tested against a trunk cluster and confirmed 
that I was still getting the '+' appended.

 WebHdfs ACL compatibility is broken
 ---

 Key: HDFS-6326
 URL: https://issues.apache.org/jira/browse/HDFS-6326
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.4.0
Reporter: Daryn Sharp
Assignee: Chris Nauroth
Priority: Blocker
 Attachments: HDFS-6326.1.patch, HDFS-6326.2.patch


 2.4 ACL support is completely incompatible with 2.4 webhdfs servers.  The NN 
 throws an {{IllegalArgumentException}} exception.
 {code}
 hadoop fs -ls webhdfs://nn/
 Found 21 items
 ls: Invalid value for webhdfs parameter op: No enum constant 
 org.apache.hadoop.hdfs.web.resources.GetOpParam.Op.GETACLSTATUS
 [... 20 more times...]
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5522) Datanode disk error check may be incorrectly skipped

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989848#comment-13989848
 ] 

Hadoop QA commented on HDFS-5522:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643378/HDFS-5522.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6817//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6817//console

This message is automatically generated.

 Datanode disk error check may be incorrectly skipped
 

 Key: HDFS-5522
 URL: https://issues.apache.org/jira/browse/HDFS-5522
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
 Attachments: HDFS-5522.patch


 After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when 
 network errors occur during processing data node requests.  This appears to 
 create problems when a disk is having problems, but not failing I/O soon. 
 If I/O hangs for a long time, network read/write may timeout first and the 
 peer may close the connection. Although the error was caused by a faulty 
 local disk, disk check is not being carried out in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6331) ClientProtocol#setXattr should not be annotated idempotent

2014-05-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989872#comment-13989872
 ] 

Chris Nauroth commented on HDFS-6331:
-

Hi, [~andrew.wang].  I see this patch is already committed, but I just want to 
confirm that I agree with your earlier statements.  Because of how the flags 
work, a retry may cause the 2nd application of the operation to throw an 
exception, even though it should have been a valid call from the client's 
perspective.  Therefore, we need {{AtMostOnce}} semantics.  As you said, this 
differs from the behavior of {{setAcl}}, which we can classify as 
{{Idempotent}}.

Thanks everyone for catching the issue and fixing it.

 ClientProtocol#setXattr should not be annotated idempotent
 --

 Key: HDFS-6331
 URL: https://issues.apache.org/jira/browse/HDFS-6331
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Andrew Wang
Assignee: Uma Maheswara Rao G
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6331.patch


 ClientProtocol#setXAttr is annotated @Idempotent, but this is incorrect since 
 subsequent retries need to throw different exceptions based on the passed 
 flags (e.g. CREATE, REPLACE).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989931#comment-13989931
 ] 

Hadoop QA commented on HDFS-6328:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643135/HDFS-6328.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6819//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6819//console

This message is automatically generated.

 Simplify code in FSDirectory
 

 Key: HDFS-6328
 URL: https://issues.apache.org/jira/browse/HDFS-6328
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6328.000.patch


 This jira proposes:
 # Cleaning up dead code in FSDirectory.
 # Simplify the control flows that IntelliJ flags as warnings.
 # Move functions related to resolving paths into one place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6165) hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory

2014-05-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989946#comment-13989946
 ] 

Yongjun Zhang commented on HDFS-6165:
-

HI [~daryn], 

I looked into callers of checkPermission, there are two other places in 
additional to delete operation:
- FSNamesystem.getContentSummary method calls checkPermission with 
FsAction.READ_EXECUTE passed to subAccess.
- FSNamesystem.checkSubtreeReadPermission method calls checkPermission with 
FsAction.READ passed to subAccess.
 so it looks like that we do need the additional parameter.

About the RemoteException,  thanks for pointing out that FsShell won't work 
correctly with other filesystems with the patch. Since there are so many 
filesystems, the scope of change to address mkdir issue will be much more wide. 
I'm thinking about handling this in a separate JIRA. What do you guys think? 
With rmr fixed, it can serve as a workaround for rmdir issue.

Thanks.





 hdfs dfs -rm -r and hdfs -rmdir commands can't remove empty directory 
 --

 Key: HDFS-6165
 URL: https://issues.apache.org/jira/browse/HDFS-6165
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Minor
 Attachments: HDFS-6165.001.patch, HDFS-6165.002.patch, 
 HDFS-6165.003.patch, HDFS-6165.004.patch, HDFS-6165.004.patch, 
 HDFS-6165.005.patch, HDFS-6165.006.patch, HDFS-6165.006.patch


 Given a directory owned by user A with WRITE permission containing an empty 
 directory owned by user B, it is not possible to delete user B's empty 
 directory with either hdfs dfs -rm -r or hdfs dfs -rmdir. Because the 
 current implementation requires FULL permission of the empty directory, and 
 throws exception. 
 On the other hand, on linux, rm -r and rmdir command can remove empty 
 directory as long as the parent directory has WRITE permission (and prefix 
 component of the path have EXECUTE permission), For the tested OSes, some 
 prompt user asking for confirmation, some don't.
 Here's a reproduction:
 {code}
 [root@vm01 ~]# hdfs dfs -ls /user/
 Found 4 items
 drwxr-xr-x   - userabc users   0 2013-05-03 01:55 /user/userabc
 drwxr-xr-x   - hdfssupergroup  0 2013-05-03 00:28 /user/hdfs
 drwxrwxrwx   - mapred  hadoop  0 2013-05-03 00:13 /user/history
 drwxr-xr-x   - hdfssupergroup  0 2013-04-14 16:46 /user/hive
 [root@vm01 ~]# hdfs dfs -ls /user/userabc
 Found 8 items
 drwx--   - userabc users  0 2013-05-02 17:00 /user/userabc/.Trash
 drwxr-xr-x   - userabc users  0 2013-05-03 01:34 /user/userabc/.cm
 drwx--   - userabc users  0 2013-05-03 01:06 
 /user/userabc/.staging
 drwxr-xr-x   - userabc users  0 2013-04-14 18:31 /user/userabc/apps
 drwxr-xr-x   - userabc users  0 2013-04-30 18:05 /user/userabc/ds
 drwxr-xr-x   - hdfsusers  0 2013-05-03 01:54 /user/userabc/foo
 drwxr-xr-x   - userabc users  0 2013-04-30 16:18 
 /user/userabc/maven_source
 drwxr-xr-x   - hdfsusers  0 2013-05-03 01:40 
 /user/userabc/test-restore
 [root@vm01 ~]# hdfs dfs -ls /user/userabc/foo/
 [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -r -skipTrash /user/userabc/foo
 rm: Permission denied: user=userabc, access=ALL, 
 inode=/user/userabc/foo:hdfs:users:drwxr-xr-x
 {code}
 The super user can delete the directory.
 {code}
 [root@vm01 ~]# sudo -u hdfs hdfs dfs -rm -r -skipTrash /user/userabc/foo
 Deleted /user/userabc/foo
 {code}
 The same is not true for files, however. They have the correct behavior.
 {code}
 [root@vm01 ~]# sudo -u hdfs hdfs dfs -touchz /user/userabc/foo-file
 [root@vm01 ~]# hdfs dfs -ls /user/userabc/
 Found 8 items
 drwx--   - userabc users  0 2013-05-02 17:00 /user/userabc/.Trash
 drwxr-xr-x   - userabc users  0 2013-05-03 01:34 /user/userabc/.cm
 drwx--   - userabc users  0 2013-05-03 01:06 
 /user/userabc/.staging
 drwxr-xr-x   - userabc users  0 2013-04-14 18:31 /user/userabc/apps
 drwxr-xr-x   - userabc users  0 2013-04-30 18:05 /user/userabc/ds
 -rw-r--r--   1 hdfsusers  0 2013-05-03 02:11 
 /user/userabc/foo-file
 drwxr-xr-x   - userabc users  0 2013-04-30 16:18 
 /user/userabc/maven_source
 drwxr-xr-x   - hdfsusers  0 2013-05-03 01:40 
 /user/userabc/test-restore
 [root@vm01 ~]# sudo -u userabc hdfs dfs -rm -skipTrash /user/userabc/foo-file
 Deleted /user/userabc/foo-file
 {code}
 Using hdfs dfs -rmdir command:
 {code}
 bash-4.1$ hadoop fs -lsr /
 lsr: DEPRECATED: Please use 'ls -R' instead.
 drwxr-xr-x   - hdfs supergroup  0 2014-03-25 16:29 /user
 drwxr-xr-x   - hdfs   supergroup  0 2014-03-25 16:28 

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-05-05 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989948#comment-13989948
 ] 

Siqi Li commented on HDFS-5928:
---

[~wheat9] for 2.3, if the JSP UI is no longer the default UI, what is the 
default UI?

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6294) Use INode IDs to avoid conflicts when a file open for write is renamed

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989955#comment-13989955
 ] 

Hadoop QA commented on HDFS-6294:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642882/HDFS-6294.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6820//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6820//console

This message is automatically generated.

 Use INode IDs to avoid conflicts when a file open for write is renamed
 --

 Key: HDFS-6294
 URL: https://issues.apache.org/jira/browse/HDFS-6294
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.20.1
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6294.001.patch, HDFS-6294.002.patch


 Now that we have a unique INode ID for each INode, clients with files that 
 are open for write can use this unique ID rather than a file path when they 
 are requesting more blocks or closing the open file.  This will avoid 
 conflicts when a file which is open for write is renamed, and another file 
 with that name is created.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989956#comment-13989956
 ] 

Haohui Mai commented on HDFS-5928:
--

Since 2.3 HDFS has moved towards a HTML5-based UI. Please see HDFS-5333 for 
more details.

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6313:
-

Status: Patch Available  (was: Open)

 WebHdfs may use the wrong NN when configured for multiple HA NNs
 

 Key: HDFS-6313
 URL: https://issues.apache.org/jira/browse/HDFS-6313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.4.0, 3.0.0
Reporter: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-6313.branch-2.4.patch, HDFS-6313.patch


 WebHdfs resolveNNAddr will return a union of addresses for all HA configured 
 NNs.  The client may access the wrong NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-6313:


Assignee: Kihwal Lee

 WebHdfs may use the wrong NN when configured for multiple HA NNs
 

 Key: HDFS-6313
 URL: https://issues.apache.org/jira/browse/HDFS-6313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.4.0
Reporter: Daryn Sharp
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-6313.branch-2.4.patch, HDFS-6313.patch


 WebHdfs resolveNNAddr will return a union of addresses for all HA configured 
 NNs.  The client may access the wrong NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6313) WebHdfs may use the wrong NN when configured for multiple HA NNs

2014-05-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6313:
-

Attachment: HDFS-6313.patch
HDFS-6313.branch-2.4.patch

The patch makes WebHdfsFileSystem extract the only entry that matches the 
logical name. The new test case demonstrates the bug.

 WebHdfs may use the wrong NN when configured for multiple HA NNs
 

 Key: HDFS-6313
 URL: https://issues.apache.org/jira/browse/HDFS-6313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.4.0
Reporter: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-6313.branch-2.4.patch, HDFS-6313.patch


 WebHdfs resolveNNAddr will return a union of addresses for all HA configured 
 NNs.  The client may access the wrong NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989971#comment-13989971
 ] 

Haohui Mai commented on HDFS-6328:
--

The test failure is unrelated. I ran the test locally and it passed.

 Simplify code in FSDirectory
 

 Key: HDFS-6328
 URL: https://issues.apache.org/jira/browse/HDFS-6328
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6328.000.patch


 This jira proposes:
 # Cleaning up dead code in FSDirectory.
 # Simplify the control flows that IntelliJ flags as warnings.
 # Move functions related to resolving paths into one place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory

2014-05-05 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989982#comment-13989982
 ] 

Jing Zhao commented on HDFS-6328:
-

The patch looks pretty good to me. Thanks for the cleanup. Some minors:
# The changes on imports seem unnecessary
# The following change may need to be reverted:
{code}
-Preconditions.checkArgument(
-src.endsWith(HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR), 
-%s does not end with %s, src, 
HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR);
+
Preconditions.checkArgument(src.endsWith(HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR),
 %s does not end with %s, src, HdfsConstants.SEPARATOR_DOT_SNAPSHOT_DIR);
{code}
# The following line exceeds 80 characters:
{code}
+  return srcs.startsWith(/)  !srcs.endsWith(/)  
getINode4Write(srcs, false) == null;
{code}
# Let's add {} for the while loop:
{code}
+while(src[i] == dst[i])
+  i++;
{code}

+1 after addressing the comments.

 Simplify code in FSDirectory
 

 Key: HDFS-6328
 URL: https://issues.apache.org/jira/browse/HDFS-6328
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6328.000.patch


 This jira proposes:
 # Cleaning up dead code in FSDirectory.
 # Simplify the control flows that IntelliJ flags as warnings.
 # Move functions related to resolving paths into one place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6328) Simplify code in FSDirectory

2014-05-05 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6328:
-

Attachment: HDFS-6328.001.patch

 Simplify code in FSDirectory
 

 Key: HDFS-6328
 URL: https://issues.apache.org/jira/browse/HDFS-6328
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6328.000.patch, HDFS-6328.001.patch


 This jira proposes:
 # Cleaning up dead code in FSDirectory.
 # Simplify the control flows that IntelliJ flags as warnings.
 # Move functions related to resolving paths into one place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6193) HftpFileSystem open should throw FileNotFoundException for non-existing paths

2014-05-05 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990007#comment-13990007
 ] 

Tsuyoshi OZAWA commented on HDFS-6193:
--

Let's wait for review by HDFS experts.

 HftpFileSystem open should throw FileNotFoundException for non-existing paths
 -

 Key: HDFS-6193
 URL: https://issues.apache.org/jira/browse/HDFS-6193
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6193-branch-2.4.0.v01.patch, 
 HDFS-6193-branch-2.4.v02.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6314) Test cases for XAttrs

2014-05-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990013#comment-13990013
 ] 

Chris Nauroth commented on HDFS-6314:
-

Hi, Yi.  Thanks for writing all of these tests.  I'd like to suggest 2 more 
test cases:

# 1) set xattrs on a file, 2) remove the xattrs from that file, 3) restart NN 
and 4) set xattrs again on that same file.  Do this test twice: once saving a 
checkpoint before the restart and again without saving a checkpoint.  The idea 
here is to make sure that we don't accidentally leave behind a lingering empty 
{{XAttrFeature}} attached to the inode after removal of the xattrs.  That would 
leave the inode in a bad state where future attempts to add xattrs would fail 
due to the precondition check in {{INodeWithAdditionalFields#addXAttrFeature}}. 
 (We had a bug like this on the ACLs feature branch at one time.)
# In {{testXAttrSymlinks}}, let's also do a {{setXAttr}} on the link, and then 
do a {{getXAttrs}} on the target and assert that the xattrs previously set 
through the link are now visible when querying on the target.


 Test cases for XAttrs
 -

 Key: HDFS-6314
 URL: https://issues.apache.org/jira/browse/HDFS-6314
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6314.1.patch, HDFS-6314.patch


 Tests NameNode interaction for all XAttr APIs, covers restarting NN, saving 
 new checkpoint.
 Tests XAttr for Snapshot, symlinks.
 Tests XAttr for HA failover.
 And more...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6340) DN can't finalize upgarde

2014-05-05 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6340:


Status: Patch Available  (was: Open)

 DN can't finalize upgarde
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Priority: Blocker
 Attachments: HDFS-6340-branch-2.4.0.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6340) DN can't finalize upgarde

2014-05-05 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990055#comment-13990055
 ] 

Arpit Agarwal commented on HDFS-6340:
-

Yes, good catch [~rahulsinghal.iitd]. The change looks fine to me but the patch 
won't apply in trunk.

The {{nn.isStandbyState()}} bug appears to have been there for a while. You 
could fix it here as Kihwal suggested or file a separate Jira for it and just 
fix the immediate regression here.

 DN can't finalize upgarde
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Priority: Blocker
 Attachments: HDFS-6340-branch-2.4.0.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory

2014-05-05 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990058#comment-13990058
 ] 

Daryn Sharp commented on HDFS-6328:
---

As a general statement, I'm not sure there's a lot of value add in the changes 
like altering whitespace and moving methods.  Mixing functional changes and 
cosmetic changes make it a bit harder to see what actually changed.  Please 
understand it does makes life harder for those of us also working in the code 
that will encounter merge conflicts...

Is there a reason why this loop needed to become more complicated?  At this 
point I believe it's guaranteed that the src  dest are not identical, nor is 
the src a subdir of the dest?
{code}
-for(; src[i] == dst[i]; i++);
 // src[i - 1] is the last common ancestor.
+while(src[i] == dst[i]  i  src.length  i  dst.length) {
+  i++;
+}
{code}

 Simplify code in FSDirectory
 

 Key: HDFS-6328
 URL: https://issues.apache.org/jira/browse/HDFS-6328
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6328.000.patch, HDFS-6328.001.patch


 This jira proposes:
 # Cleaning up dead code in FSDirectory.
 # Simplify the control flows that IntelliJ flags as warnings.
 # Move functions related to resolving paths into one place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory

2014-05-05 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990064#comment-13990064
 ] 

Daryn Sharp commented on HDFS-6315:
---

I am also working towards the goal of removing or minimizing the use of the FSD 
lock, but I recall it's being used to protect non-threadsafe data structures 
(like the inode map and snapshot manager).  It's spurred by the work to add 
fine grain locking to the namesystem - which has been derailed by other 
pressing issues.  Do keep in mind that hopefully in the next few months there 
will not be a globally held FSN so don't entirely remove the FSD lock believing 
the FSN lock will cover for it.

bq. The change can be reverted when removing the lock of FSDirectory.

I'm curious what you have in mind.  HDFS-5693 appears to be a valuable change.  
I thoughts deletes used to do something similar while collecting blocks, but 
that whole region of code has been changed.



 Decouple recording edit logs from FSDirectory
 -

 Key: HDFS-6315
 URL: https://issues.apache.org/jira/browse/HDFS-6315
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch


 Currently both FSNamesystem and FSDirectory record edit logs. This design 
 requires both FSNamesystem and FSDirectory to be tightly coupled together to 
 implement a durable namespace.
 This jira proposes to separate the responsibility of implementing the 
 namespace and providing durability with edit logs. Specifically, FSDirectory 
 implements the namespace (which should have no edit log operations), and 
 FSNamesystem implement durability by recording the edit logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6328) Simplify code in FSDirectory

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990066#comment-13990066
 ] 

Haohui Mai commented on HDFS-6328:
--

The main motivation of this jira is to perform identical code clean up to make 
the reviews of HDFS-6330 and HDFS-6315 easier.

The original motivation is to make sure there is no OutOfBoundException in the 
loop locally without going through all the traces, but it looks the order of 
the clauses is wrong. I'll fix it in another patch.

 Simplify code in FSDirectory
 

 Key: HDFS-6328
 URL: https://issues.apache.org/jira/browse/HDFS-6328
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6328.000.patch, HDFS-6328.001.patch


 This jira proposes:
 # Cleaning up dead code in FSDirectory.
 # Simplify the control flows that IntelliJ flags as warnings.
 # Move functions related to resolving paths into one place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6317) Add snapshot quota

2014-05-05 Thread Alex Shafer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990073#comment-13990073
 ] 

Alex Shafer commented on HDFS-6317:
---

That would be most appreciated.

 Add snapshot quota
 --

 Key: HDFS-6317
 URL: https://issues.apache.org/jira/browse/HDFS-6317
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Alex Shafer

 Either allow the 65k snapshot limit to be set with a configuration option  or 
 add a per-directory snapshot quota settable with the `hdfs dfsadmin` CLI and 
 viewable by appending fields to `hdfs dfs -count -q` output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6056) Clean up NFS config settings

2014-05-05 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6056:
-

Attachment: HDFS-6056.003.patch

 Clean up NFS config settings
 

 Key: HDFS-6056
 URL: https://issues.apache.org/jira/browse/HDFS-6056
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.3.0
Reporter: Aaron T. Myers
Assignee: Brandon Li
 Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
 HDFS-6056.003.patch


 As discussed on HDFS-6050, there's a few opportunities to improve the config 
 settings related to NFS. This JIRA is to implement those changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6315) Decouple recording edit logs from FSDirectory

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990076#comment-13990076
 ] 

Haohui Mai commented on HDFS-6315:
--

bq. Do keep in mind that hopefully in the next few months there will not be a 
globally held FSN so don't entirely remove the FSD lock believing the FSN lock 
will cover for it.

The ultimate goal here is to allow FSD only implement the mappings between 
names and inodes. That way locking is an implementation detail but not part of 
the interface. FSD can be implemented in lock-free data structure which does 
not require locking at all. Making the FSN lock more fine-grained is definitely 
useful but it is orthogonal. 

bq. I'm curious what you have in mind. HDFS-5693 appears to be a valuable 
change. I thoughts deletes used to do something similar while collecting 
blocks, but that whole region of code has been changed.

Based on my initial surveys, in the majority (90%) cases that both FSD lock 
and FSN lock are held together. They can be combined with little performance 
lost in today's codebase. In longer term FSD might be lock-free as I mentioned 
above.


 Decouple recording edit logs from FSDirectory
 -

 Key: HDFS-6315
 URL: https://issues.apache.org/jira/browse/HDFS-6315
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6315.000.patch, HDFS-6315.001.patch


 Currently both FSNamesystem and FSDirectory record edit logs. This design 
 requires both FSNamesystem and FSDirectory to be tightly coupled together to 
 implement a durable namespace.
 This jira proposes to separate the responsibility of implementing the 
 namespace and providing durability with edit logs. Specifically, FSDirectory 
 implements the namespace (which should have no edit log operations), and 
 FSNamesystem implement durability by recording the edit logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990114#comment-13990114
 ] 

Hadoop QA commented on HDFS-6336:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643390/HDFS-6336.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6822//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6822//console

This message is automatically generated.

 Cannot download file via webhdfs when wildcard is enabled
 -

 Key: HDFS-6336
 URL: https://issues.apache.org/jira/browse/HDFS-6336
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, 
 HDFS-6336.001.patch


 With wildcard is enabled, issuing a webhdfs command like
 {code}
 http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN
 {code}
 would give
 {code}
 http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0
 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call
  From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6336) Cannot download file via webhdfs when wildcard is enabled

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990198#comment-13990198
 ] 

Haohui Mai commented on HDFS-6336:
--

Since it is passing IP / port around, it looks to me that the patch does not 
allow the DN to fail over when HA is enabled.

 Cannot download file via webhdfs when wildcard is enabled
 -

 Key: HDFS-6336
 URL: https://issues.apache.org/jira/browse/HDFS-6336
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6336.001.patch, HDFS-6336.001.patch, 
 HDFS-6336.001.patch


 With wildcard is enabled, issuing a webhdfs command like
 {code}
 http://yjztvm2.private:50070/webhdfs/v1/tmp?op=OPEN
 {code}
 would give
 {code}
 http://yjztvm3.private:50075/webhdfs/v1/tmp?op=OPENnamenoderpcaddress=0.0.0.0:8020offset=0
 {RemoteException:{exception:ConnectException,javaClassName:java.net.ConnectException,message:Call
  From yjztvm3.private/192.168.142.230 to 0.0.0.0:8020 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990204#comment-13990204
 ] 

Haohui Mai commented on HDFS-6293:
--

bq. There is existing apps that use a custom Visitor similar to lsr. It outputs 
directory entries with full path and list of blocks for files.

[~kihwal], can you please elaborate it? If you're talking about use cases like 
hdfs-du, there is no need to construct the whole namespace from bottom up. 
Scanning through the records would be sufficient.

bq. That was the first thing I thought about doing, but the processing time 
matters too.

It might not be as bad as you thought. I ran an experiments to see how much 
time is required to convert an fsimage to a level db on an 8-core Xeon E5530 
CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running 
RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers 
reported in HDFS-5698.

|Size in Old|512M|1G|2G|4G|8G| 
|Size in PB|469M|950M|1.9G|3.7G|7.0G| 
|Converting to LevelDB (ms)|30505|56531|121579|373108|1047121|

The additional latency for a 8G fsimage is around 15mins, which looks 
reasonable for me for the use cases of an offline tool.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-6293.000.patch, Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6293:
-

Attachment: HDFS-6293.001.patch

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, Heap 
 Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990208#comment-13990208
 ] 

Binglin Chang commented on HDFS-6342:
-

Test rack capacities are equal doesn't mean there is no block movement cross 
rack, I don't think simply add a new datanode works, right? Maybe we can make 
more changes and in the mean time reduce the timeout if possible, 80 seconds 
for a test is a bit long. 

 TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if 
 balancer.id file is huge
 ---

 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
 Attachments: HDFS-6342.patch


 The testBalancerWithRackLocality mehtod is to test balancer moving data 
 blocks with rack locality consideration. 
 It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother 
 node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size 
 is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It 
 create 180 data blocks with replication factor 2. Then, a node datanode is 
 created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
 It expects there is only data blocks moving within rack1. After balancer is 
 done, it assumes the data size on both racks is the same. It will break
 if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990211#comment-13990211
 ] 

Binglin Chang commented on HDFS-6342:
-

As for the fix, I see the need to write a balancer id file, but fill it with 
hostname doesn't seem to be necessary(cause it is never used anywhere), so if 
we can modify balancer, write the balancer file but don't write any content, it 
should not have side effects to balancer and test check code, and we may skip 
timeout(need to confirm)

 TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if 
 balancer.id file is huge
 ---

 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
 Attachments: HDFS-6342.patch


 The testBalancerWithRackLocality mehtod is to test balancer moving data 
 blocks with rack locality consideration. 
 It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother 
 node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size 
 is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It 
 create 180 data blocks with replication factor 2. Then, a node datanode is 
 created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
 It expects there is only data blocks moving within rack1. After balancer is 
 done, it assumes the data size on both racks is the same. It will break
 if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6337) Setfacl testcase is failing due to dash character in username in TestAclCLI

2014-05-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990230#comment-13990230
 ] 

Hudson commented on HDFS-6337:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5593 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5593/])
HDFS-6337. Setfacl testcase is failing due to dash character in username in 
TestAclCLI. Contributed by Uma Maheswara Rao G. (umamahesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592489)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testAclCLI.xml


 Setfacl testcase is failing due to dash character in username in TestAclCLI
 ---

 Key: HDFS-6337
 URL: https://issues.apache.org/jira/browse/HDFS-6337
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6337.patch


 TestHDFSCLI is failing due to a '-' in username.
 I have seen the similar fix done in HDFS-5821. So, same fix should be done 
 for setfacl case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6340) DN can't finalize upgarde

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990269#comment-13990269
 ] 

Hadoop QA commented on HDFS-6340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12643277/HDFS-6340-branch-2.4.0.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6828//console

This message is automatically generated.

 DN can't finalize upgarde
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Priority: Blocker
 Attachments: HDFS-6340-branch-2.4.0.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990204#comment-13990204
 ] 

Haohui Mai edited comment on HDFS-6293 at 5/6/14 4:57 AM:
--

bq. There is existing apps that use a custom Visitor similar to lsr. It outputs 
directory entries with full path and list of blocks for files.

[~kihwal], can you please elaborate it? If you're talking about use cases like 
hdfs-du, scanning through the records might be sufficient.

bq. That was the first thing I thought about doing, but the processing time 
matters too.

It might not be as bad as you thought. I ran an experiments to see how much 
time is required to convert an fsimage to a level db on an 8-core Xeon E5530 
CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running 
RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers 
reported in HDFS-5698.

|Size in Old|512M|1G|2G|4G|8G| 
|Size in PB|469M|950M|1.9G|3.7G|7.0G| 
|Converting to LevelDB (ms)|30505|56531|121579|373108|1047121|

The additional latency for a 8G fsimage is around 15mins.


was (Author: wheat9):
bq. There is existing apps that use a custom Visitor similar to lsr. It outputs 
directory entries with full path and list of blocks for files.

[~kihwal], can you please elaborate it? If you're talking about use cases like 
hdfs-du, there is no need to construct the whole namespace from bottom up. 
Scanning through the records would be sufficient.

bq. That was the first thing I thought about doing, but the processing time 
matters too.

It might not be as bad as you thought. I ran an experiments to see how much 
time is required to convert an fsimage to a level db on an 8-core Xeon E5530 
CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running 
RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers 
reported in HDFS-5698.

|Size in Old|512M|1G|2G|4G|8G| 
|Size in PB|469M|950M|1.9G|3.7G|7.0G| 
|Converting to LevelDB (ms)|30505|56531|121579|373108|1047121|

The additional latency for a 8G fsimage is around 15mins, which looks 
reasonable for me for the use cases of an offline tool.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, Heap 
 Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)