[jira] [Updated] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-18 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-7542:
--
Attachment: HDFS-7542.002.patch

 Add an option to DFSAdmin -safemode wait to ignore connection failures
 --

 Key: HDFS-7542
 URL: https://issues.apache.org/jira/browse/HDFS-7542
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.6.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Attachments: HDFS-7542.001.patch, HDFS-7542.002.patch


 Currently, the _dfsadmin -safemode wait_ command aborts when connection to 
 the NN fails (network glitch, ConnectException when NN is unreachable, 
 EOFException if network link shut down). 
 In certain situations, users have asked for an option to make the command 
 resilient to connection failures. This is useful so that the admin can 
 initiate the wait command despite the NN not being fully up or survive 
 intermittent network issues. With this option, the admin can rely on the wait 
 command continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-18 Thread Stephen Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251365#comment-14251365
 ] 

Stephen Chu commented on HDFS-7542:
---

TestRollingUpgradeRollback failure is unrelated to these DFSAdmin command 
changes. I re-ran the test a few times successfully. The release audit warning 
also seems to be incorrect because all modified files have the Apache license. 
It's hard to see the exact test name that timed out. Retrying jenkins with the 
same patch. 

 Add an option to DFSAdmin -safemode wait to ignore connection failures
 --

 Key: HDFS-7542
 URL: https://issues.apache.org/jira/browse/HDFS-7542
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.6.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Attachments: HDFS-7542.001.patch, HDFS-7542.002.patch


 Currently, the _dfsadmin -safemode wait_ command aborts when connection to 
 the NN fails (network glitch, ConnectException when NN is unreachable, 
 EOFException if network link shut down). 
 In certain situations, users have asked for an option to make the command 
 resilient to connection failures. This is useful so that the admin can 
 initiate the wait command despite the NN not being fully up or survive 
 intermittent network issues. With this option, the admin can rely on the wait 
 command continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC

2014-12-18 Thread Jinghui Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---
Summary: Running Httpfs UTs using MiniKDC  (was: Running Httpfs UTs with 
testKerberos profile has failures.)

 Running Httpfs UTs using MiniKDC
 

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang

 UT failures in TestHttpFSWithKerberos.
 Tests using testDelegationTokenWithinDoAs fail because of the statically set 
 keytab file.
 Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
 that CANCELDELEGATIONTOKEN does not require credentials.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC

2014-12-18 Thread Jinghui Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---
Description: JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests 
to use MiniKDC. This JIRA is doing the same thing for HttpFS to avoid the 
hassle of setting up Kerberos environment to run HttpFS with Kerberos unit 
tests.  (was: UT failures in TestHttpFSWithKerberos.
Tests using testDelegationTokenWithinDoAs fail because of the statically set 
keytab file.
Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
that CANCELDELEGATIONTOKEN does not require credentials.)

 Running Httpfs UTs using MiniKDC
 

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang

 JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. 
 This JIRA is doing the same thing for HttpFS to avoid the hassle of setting 
 up Kerberos environment to run HttpFS with Kerberos unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC

2014-12-18 Thread Jinghui Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---
Attachment: (was: HDFS-6149.patch)

 Running Httpfs UTs using MiniKDC
 

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang

 JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. 
 This JIRA is doing the same thing for HttpFS to avoid the hassle of setting 
 up Kerberos environment to run HttpFS with Kerberos unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6149) Running Httpfs UTs using MiniKDC

2014-12-18 Thread Jinghui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251387#comment-14251387
 ] 

Jinghui Wang commented on HDFS-6149:


Updating the patch to move TestHttpFSWithKerberos to use MiniKDC rather than 
depending on native Kerberos environment setup for unit tests.

 Running Httpfs UTs using MiniKDC
 

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang

 JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. 
 This JIRA is doing the same thing for HttpFS to avoid the hassle of setting 
 up Kerberos environment to run HttpFS with Kerberos unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6149) Running Httpfs UTs using MiniKDC

2014-12-18 Thread Jinghui Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---
Attachment: HDFS-6149.patch

 Running Httpfs UTs using MiniKDC
 

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Attachments: HDFS-6149.patch


 JIRA HADOOP-9866 converted hadoop-common Kerberos unit tests to use MiniKDC. 
 This JIRA is doing the same thing for HttpFS to avoid the hassle of setting 
 up Kerberos environment to run HttpFS with Kerberos unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251415#comment-14251415
 ] 

Hadoop QA commented on HDFS-7527:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687971/HDFS-7527.002.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9070//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9070//console

This message is automatically generated.

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang
 Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch


 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at 

[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever

2014-12-18 Thread Frantisek Vacek (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251420#comment-14251420
 ] 

Frantisek Vacek commented on HDFS-7392:
---

Problem can be solved by different implementation of 
SecurityUtils.StandardHostResolver.getByName(String host)

Current implementation
{code}
  interface HostResolver {
InetAddress getByName(String host) throws UnknownHostException;
  }
  
/**
   * Uses standard java host resolution
   */
  static class StandardHostResolver implements HostResolver {
@Override
public InetAddress getByName(String host) throws UnknownHostException {
  return InetAddress.getByName(host);
}
  }
{code}

Proper implementation should be like
{code}
  interface HostResolver {
InetAddress[] getByName(String host) throws UnknownHostException;
  }
  
  /**
   * Uses standard java host resolution
   */
  static class StandardHostResolver implements HostResolver {
@Override
public InetAddress[] getByName(String host) throws UnknownHostException {
  return InetAddress.getAllByName(host);
}
  }
{code}

 org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
 -

 Key: HDFS-7392
 URL: https://issues.apache.org/jira/browse/HDFS-7392
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Frantisek Vacek
Assignee: Yi Liu
 Attachments: 1.png, 2.png


 In some specific circumstances, 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts 
 and last forever.
 What are specific circumstances:
 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point 
 to valid IP address but without name node service running on it.
 2) There should be at least 2 IP addresses for such a URI. See output below:
 {quote}
 [~/proj/quickbox]$ nslookup share.example.com
 Server: 127.0.1.1
 Address:127.0.1.1#53
 share.example.com canonical name = 
 internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.223
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.65
 {quote}
 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() 
 returns sometimes true (even if address didn't actually changed see img. 1) 
 and the timeoutFailures counter is set to 0 (see img. 2). The 
 maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is 
 repeated forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251473#comment-14251473
 ] 

Hudson commented on HDFS-7531:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
 HDFS-7531.002.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251469#comment-14251469
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251484#comment-14251484
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java


 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251488#comment-14251488
 ] 

Hudson commented on HDFS-7531:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
 HDFS-7531.002.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Harsh J (JIRA)
Harsh J created HDFS-7546:
-

 Summary: Document, and set an accepting default for 
dfs.namenode.kerberos.principal.pattern
 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Reporter: Harsh J
Priority: Minor


This config is used in the SaslRpcClient, and the no-default breaks cross-realm 
trust principals being used at clients.

Current location: 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309

The config should be documented and the default should be set to * to preserve 
the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-7546:
--
Attachment: HDFS-7546.patch

 Document, and set an accepting default for 
 dfs.namenode.kerberos.principal.pattern
 --

 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Reporter: Harsh J
Priority: Minor
 Attachments: HDFS-7546.patch


 This config is used in the SaslRpcClient, and the no-default breaks 
 cross-realm trust principals being used at clients.
 Current location: 
 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309
 The config should be documented and the default should be set to * to 
 preserve the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-7546:
--
 Assignee: Harsh J
 Target Version/s: 2.7.0
Affects Version/s: 2.1.1-beta
   Status: Patch Available  (was: Open)

 Document, and set an accepting default for 
 dfs.namenode.kerberos.principal.pattern
 --

 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.1.1-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-7546.patch


 This config is used in the SaslRpcClient, and the no-default breaks 
 cross-realm trust principals being used at clients.
 Current location: 
 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309
 The config should be documented and the default should be set to * to 
 preserve the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251567#comment-14251567
 ] 

Hadoop QA commented on HDFS-7542:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687988/HDFS-7542.002.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9071//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9071//console

This message is automatically generated.

 Add an option to DFSAdmin -safemode wait to ignore connection failures
 --

 Key: HDFS-7542
 URL: https://issues.apache.org/jira/browse/HDFS-7542
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.6.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Attachments: HDFS-7542.001.patch, HDFS-7542.002.patch


 Currently, the _dfsadmin -safemode wait_ command aborts when connection to 
 the NN fails (network glitch, ConnectException when NN is unreachable, 
 EOFException if network link shut down). 
 In certain situations, users have asked for an option to make the command 
 resilient to connection failures. This is useful so that the admin can 
 initiate the wait command despite the NN not being fully up or survive 
 intermittent network issues. With this option, the admin can rely on the wait 
 command continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-12-18 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated HDFS-6833:
-
Attachment: HDFS-6833-13.patch

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.5.0, 2.5.1
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Critical
 Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, 
 HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-6-2.patch, 
 HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, 
 HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-12-18 Thread Shinichi Yamashita (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251581#comment-14251581
 ] 

Shinichi Yamashita commented on HDFS-6833:
--

Hi [~yzhangal],

Thank you for your review! My previous patch file was not sufficient.
I attach a patch file that fixed two things.


 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.5.0, 2.5.1
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Critical
 Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, 
 HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-6-2.patch, 
 HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, 
 HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever

2014-12-18 Thread Frantisek Vacek (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251637#comment-14251637
 ] 

Frantisek Vacek commented on HDFS-7392:
---

Please ignore my previous proposal, it will not work.

 org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
 -

 Key: HDFS-7392
 URL: https://issues.apache.org/jira/browse/HDFS-7392
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Frantisek Vacek
Assignee: Yi Liu
 Attachments: 1.png, 2.png


 In some specific circumstances, 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts 
 and last forever.
 What are specific circumstances:
 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point 
 to valid IP address but without name node service running on it.
 2) There should be at least 2 IP addresses for such a URI. See output below:
 {quote}
 [~/proj/quickbox]$ nslookup share.example.com
 Server: 127.0.1.1
 Address:127.0.1.1#53
 share.example.com canonical name = 
 internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.223
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.65
 {quote}
 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() 
 returns sometimes true (even if address didn't actually changed see img. 1) 
 and the timeoutFailures counter is set to 0 (see img. 2). The 
 maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is 
 repeated forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251683#comment-14251683
 ] 

Allen Wittenauer commented on HDFS-7546:


Is it just namenode or is it any service that has Kerberos configured?  

 Document, and set an accepting default for 
 dfs.namenode.kerberos.principal.pattern
 --

 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.1.1-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-7546.patch


 This config is used in the SaslRpcClient, and the no-default breaks 
 cross-realm trust principals being used at clients.
 Current location: 
 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309
 The config should be documented and the default should be set to * to 
 preserve the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251715#comment-14251715
 ] 

Hudson commented on HDFS-7531:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
 HDFS-7531.002.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251711#comment-14251711
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java


 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251729#comment-14251729
 ] 

Hudson commented on HDFS-7531:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java


 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
 HDFS-7531.002.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever

2014-12-18 Thread Frantisek Vacek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frantisek Vacek updated HDFS-7392:
--
Attachment: HDFS-7392.diff

 org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
 -

 Key: HDFS-7392
 URL: https://issues.apache.org/jira/browse/HDFS-7392
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Frantisek Vacek
Assignee: Yi Liu
 Attachments: 1.png, 2.png, HDFS-7392.diff


 In some specific circumstances, 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts 
 and last forever.
 What are specific circumstances:
 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point 
 to valid IP address but without name node service running on it.
 2) There should be at least 2 IP addresses for such a URI. See output below:
 {quote}
 [~/proj/quickbox]$ nslookup share.example.com
 Server: 127.0.1.1
 Address:127.0.1.1#53
 share.example.com canonical name = 
 internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.223
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.65
 {quote}
 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() 
 returns sometimes true (even if address didn't actually changed see img. 1) 
 and the timeoutFailures counter is set to 0 (see img. 2). The 
 maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is 
 repeated forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever

2014-12-18 Thread Frantisek Vacek (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251739#comment-14251739
 ] 

Frantisek Vacek commented on HDFS-7392:
---

Finaly, I've created promissed patch. It is attached as HDFS-7392.diff . It is 
not a final solution of course, but it is working and I hope that it brings a 
light to the problem we are facing on. Yi, please contact me if you need more 
info or explanation.

regards

Fanda

 org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
 -

 Key: HDFS-7392
 URL: https://issues.apache.org/jira/browse/HDFS-7392
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Frantisek Vacek
Assignee: Yi Liu
 Attachments: 1.png, 2.png, HDFS-7392.diff


 In some specific circumstances, 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts 
 and last forever.
 What are specific circumstances:
 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point 
 to valid IP address but without name node service running on it.
 2) There should be at least 2 IP addresses for such a URI. See output below:
 {quote}
 [~/proj/quickbox]$ nslookup share.example.com
 Server: 127.0.1.1
 Address:127.0.1.1#53
 share.example.com canonical name = 
 internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.223
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.65
 {quote}
 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() 
 returns sometimes true (even if address didn't actually changed see img. 1) 
 and the timeoutFailures counter is set to 0 (see img. 2). The 
 maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is 
 repeated forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever

2014-12-18 Thread Frantisek Vacek (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251741#comment-14251741
 ] 

Frantisek Vacek commented on HDFS-7392:
---

I'm also attaching a log whent the patch is applied.

{code}
DEBUG [main] (Client.java:697) - Connecting to share.merck.com/54.40.29.65:8020
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 1 time(s); maxRetries=45
 WARN [main] (Client.java:564) - Address change detected. Old: 
share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 2 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 1 time(s); maxRetries=45
 WARN [main] (Client.java:564) - Address change detected. Old: 
share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 2 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 3 time(s); maxRetries=45
 WARN [main] (Client.java:564) - Address change detected. Old: 
share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 4 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 3 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 4 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 5 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 6 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 7 time(s); maxRetries=45
 WARN [main] (Client.java:564) - Address change detected. Old: 
share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 8 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 5 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 6 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 7 time(s); maxRetries=45
 WARN [main] (Client.java:564) - Address change detected. Old: 
share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 8 time(s); maxRetries=45
 INFO [main] (Client.java:816) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 9 time(s); maxRetries=45
{code} 

 org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
 -

 Key: HDFS-7392
 URL: https://issues.apache.org/jira/browse/HDFS-7392
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Frantisek Vacek
Assignee: Yi Liu
 Attachments: 1.png, 2.png, HDFS-7392.diff


 In some specific circumstances, 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts 
 and last forever.
 What are specific circumstances:
 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point 
 to valid IP address but without name node service running on it.
 2) There should be at least 2 IP addresses for such a URI. See output below:
 {quote}
 [~/proj/quickbox]$ nslookup share.example.com
 Server: 127.0.1.1
 Address:127.0.1.1#53
 share.example.com canonical name = 
 internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.223
 Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
 Address: 192.168.1.65
 {quote}
 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() 
 returns sometimes true (even if address didn't actually changed see img. 1) 
 and the timeoutFailures counter is set to 0 (see img. 2). The 
 maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is 
 repeated forever.



--
This message 

[jira] [Updated] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7546:

Labels: supportability  (was: )

 Document, and set an accepting default for 
 dfs.namenode.kerberos.principal.pattern
 --

 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.1.1-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
  Labels: supportability
 Attachments: HDFS-7546.patch


 This config is used in the SaslRpcClient, and the no-default breaks 
 cross-realm trust principals being used at clients.
 Current location: 
 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309
 The config should be documented and the default should be set to * to 
 preserve the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)
Binglin Chang created HDFS-7547:
---

 Summary: Fix 
TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251725#comment-14251725
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Description: HDFS-7531 changes the implementation of FsVolumeList, but 
doesn't change it's toString method to keep the old desc string format, test 

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Description: HDFS-7531 changes the implementation of FsVolumeList, but 
doesn't change it's toString method to keep the old desc string format, test 
TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so 
this test always fails.   (was: HDFS-7531 changes the implementation of 
FsVolumeList, but doesn't change it's toString method to keep the old desc 
string format, test )

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Status: Patch Available  (was: Open)

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Attachment: HDFS-7547.001.patch

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-7547.001.patch


 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251769#comment-14251769
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java


 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251773#comment-14251773
 ] 

Hudson commented on HDFS-7531:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
 HDFS-7531.002.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251781#comment-14251781
 ] 

Yongjun Zhang commented on HDFS-7546:
-

Hi [~qwertymaniac], 

Thanks for reporting the issue and providing patch. I labeled it as 
supportability.  I reviewed the change and have a few comments.
* The description of the property can be improved with more information. What 
about:
{code}
A client-side property that describes permitted server principal pattern. It 
can be configured
to control allowed realms to authenticate with, which is useful in cross-realm 
environment.
{code}
* what's the current default of this property prior to your change?
* wonder if there is any catch by changing the default pattern to *, which 
essentially accepts any pattern?




 Document, and set an accepting default for 
 dfs.namenode.kerberos.principal.pattern
 --

 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.1.1-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
  Labels: supportability
 Attachments: HDFS-7546.patch


 This config is used in the SaslRpcClient, and the no-default breaks 
 cross-realm trust principals being used at clients.
 Current location: 
 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309
 The config should be documented and the default should be set to * to 
 preserve the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251786#comment-14251786
 ] 

Hadoop QA commented on HDFS-7546:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688010/HDFS-7546.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9072//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9072//console

This message is automatically generated.

 Document, and set an accepting default for 
 dfs.namenode.kerberos.principal.pattern
 --

 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.1.1-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
  Labels: supportability
 Attachments: HDFS-7546.patch


 This config is used in the SaslRpcClient, and the no-default breaks 
 cross-realm trust principals being used at clients.
 Current location: 
 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309
 The config should be documented and the default should be set to * to 
 preserve the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251811#comment-14251811
 ] 

Hudson commented on HDFS-7531:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
 HDFS-7531.002.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251807#comment-14251807
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java


 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it

2014-12-18 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created HDFS-7548:


 Summary: Corrupt block reporting delayed until datablock scanner 
thread detects it
 Key: HDFS-7548
 URL: https://issues.apache.org/jira/browse/HDFS-7548
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


When there is one datanode holding the block and that block happened to be
corrupt, namenode would keep on trying to replicate the block repeatedly but it 
would only report the block as corrupt only when the data block scanner thread 
of the datanode picks up this bad block.
Requesting improvement in namenode reporting so that corrupt replica would be 
reported when there is only 1 replica and the replication of that replica keeps 
on failing with the checksum error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it

2014-12-18 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7548:
-
Status: Patch Available  (was: Open)

 Corrupt block reporting delayed until datablock scanner thread detects it
 -

 Key: HDFS-7548
 URL: https://issues.apache.org/jira/browse/HDFS-7548
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7548.patch


 When there is one datanode holding the block and that block happened to be
 corrupt, namenode would keep on trying to replicate the block repeatedly but 
 it would only report the block as corrupt only when the data block scanner 
 thread of the datanode picks up this bad block.
 Requesting improvement in namenode reporting so that corrupt replica would be 
 reported when there is only 1 replica and the replication of that replica 
 keeps on failing with the checksum error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it

2014-12-18 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7548:
-
Attachment: HDFS-7548.patch

Whenever in write pipeline if the datanode detects any checksum error while 
transferring the block to target node, that particular block is added to first 
position in the blockInfoSet with setting the lastScanTime to 0.
This will make the BlockPoolSliceScanner to pick this block first since that 
data structure is sorted by lastScanTime.
In this way, we will scan this corrupt block first and will report it to 
namenode.


 Corrupt block reporting delayed until datablock scanner thread detects it
 -

 Key: HDFS-7548
 URL: https://issues.apache.org/jira/browse/HDFS-7548
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7548.patch


 When there is one datanode holding the block and that block happened to be
 corrupt, namenode would keep on trying to replicate the block repeatedly but 
 it would only report the block as corrupt only when the data block scanner 
 thread of the datanode picks up this bad block.
 Requesting improvement in namenode reporting so that corrupt replica would be 
 reported when there is only 1 replica and the replication of that replica 
 keeps on failing with the checksum error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()

2014-12-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-7538:
-
Assignee: Ted Yu
  Status: Patch Available  (was: Open)

 removedDst should be checked against null in the finally block of 
 FSDirRenameOp#unprotectedRenameTo()
 -

 Key: HDFS-7538
 URL: https://issues.apache.org/jira/browse/HDFS-7538
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-7538-001.patch


 {code}
 if (removedDst != null) {
   undoRemoveDst = false;
 ...
   if (undoRemoveDst) {
 // Rename failed - restore dst
 if (dstParent.isDirectory() 
 dstParent.asDirectory().isWithSnapshot()) {
   dstParent.asDirectory().undoRename4DstParent(removedDst,
 {code}
 If the first if check doesn't pass, removedDst would be null and 
 undoRemoveDst may be true.
 This combination would lead to NullPointerException in the finally block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()

2014-12-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-7538:
-
Attachment: hdfs-7538-001.patch

 removedDst should be checked against null in the finally block of 
 FSDirRenameOp#unprotectedRenameTo()
 -

 Key: HDFS-7538
 URL: https://issues.apache.org/jira/browse/HDFS-7538
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: hdfs-7538-001.patch


 {code}
 if (removedDst != null) {
   undoRemoveDst = false;
 ...
   if (undoRemoveDst) {
 // Rename failed - restore dst
 if (dstParent.isDirectory() 
 dstParent.asDirectory().isWithSnapshot()) {
   dstParent.asDirectory().undoRename4DstParent(removedDst,
 {code}
 If the first if check doesn't pass, removedDst would be null and 
 undoRemoveDst may be true.
 This combination would lead to NullPointerException in the finally block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7464) TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 8

2014-12-18 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-7464:
--
Assignee: (was: Chen He)

 TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 
 8
 ---

 Key: HDFS-7464
 URL: https://issues.apache.org/jira/browse/HDFS-7464
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/23/ :
 {code}
 REGRESSION:  
 org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration
 Error Message:
 refreshSuperUserGroupsConfiguration: End of File Exception between local host 
 is: asf908.gq1.ygridcore.net/67.195.81.152; destination host is: 
 localhost:12700; : java.io.EOFException; For more details see:  
 http://wiki.apache.org/hadoop/EOFException expected:0 but was:-1
 Stack Trace:
 java.lang.AssertionError: refreshSuperUserGroupsConfiguration: End of File 
 Exception between local host is: asf908.gq1.ygridcore.net/67.195.81.152; 
 destination host is: localhost:12700; : java.io.EOFException; For more 
 details see:  http://wiki.apache.org/hadoop/EOFException expected:0 but 
 was:-1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration(TestDFSAdminWithHA.java:228)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252032#comment-14252032
 ] 

Hadoop QA commented on HDFS-7547:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688049/HDFS-7547.001.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9074//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9074//console

This message is automatically generated.

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-7547.001.patch


 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7373) Clean up temporary files after fsimage transfer failures

2014-12-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7373:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review, Akira. I've committed this to trunk and branch-2.

 Clean up temporary files after fsimage transfer failures
 

 Key: HDFS-7373
 URL: https://issues.apache.org/jira/browse/HDFS-7373
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-7373.patch


 When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in 
 each storage directory.  If the size of name space is large, these files can 
 take up quite a bit of space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7373) Clean up temporary files after fsimage transfer failures

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252050#comment-14252050
 ] 

Hudson commented on HDFS-7373:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6747 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6747/])
HDFS-7373. Clean up temporary files after fsimage transfer failures. 
Contributed by Kihwal Lee (kihwal: rev c0d666c74e9ea76564a2458c6c0a78ae7afa9fea)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java


 Clean up temporary files after fsimage transfer failures
 

 Key: HDFS-7373
 URL: https://issues.apache.org/jira/browse/HDFS-7373
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-7373.patch


 When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in 
 each storage directory.  If the size of name space is large, these files can 
 take up quite a bit of space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7443:
---
Summary: Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block 
files are present in the same volume  (was: Datanode upgrade to 
BLOCKID_BASED_LAYOUT sometimes fails)

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Priority: Blocker

 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-12-18 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252063#comment-14252063
 ] 

Ming Ma commented on HDFS-5535:
---

Opened https://issues.apache.org/jira/browse/HDFS-7541 to explore ideas for 
more efficient DN rolling upgrades.

 Umbrella jira for improved HDFS rolling upgrades
 

 Key: HDFS-5535
 URL: https://issues.apache.org/jira/browse/HDFS-5535
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, ha, hdfs-client, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Nathan Roberts
Assignee: Tsz Wo Nicholas Sze
 Fix For: 2.4.0

 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
 HDFSRollingUpgradesHighLevelDesign.v2.pdf, 
 HDFSRollingUpgradesHighLevelDesign.v3.pdf, h5535_20140219.patch, 
 h5535_20140220-1554.patch, h5535_20140220b.patch, h5535_20140221-2031.patch, 
 h5535_20140224-1931.patch, h5535_20140225-1225.patch, 
 h5535_20140226-1328.patch, h5535_20140226-1911.patch, 
 h5535_20140227-1239.patch, h5535_20140228-1714.patch, 
 h5535_20140304-1138.patch, h5535_20140304-branch-2.patch, 
 h5535_20140310-branch-2.patch, hdfs-5535-test-plan.pdf


 In order to roll a new HDFS release through a large cluster quickly and 
 safely, a few enhancements are needed in HDFS. An initial High level design 
 document will be attached to this jira, and sub-jiras will itemize the 
 individual tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-7443:
--

Assignee: Colin Patrick McCabe

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker

 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-18 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252079#comment-14252079
 ] 

Jing Zhao commented on HDFS-7543:
-

Thanks for working on this, Haohui! The patch looks good to me. The test 
failures should be unrelated. +1

 Avoid path resolution when getting FileStatus for audit logs
 

 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7543.000.patch


 The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
 avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
 the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-18 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7543:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks Jing for the reviews.

 Avoid path resolution when getting FileStatus for audit logs
 

 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7543.000.patch


 The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
 avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
 the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252104#comment-14252104
 ] 

Hudson commented on HDFS-7543:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6749 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6749/])
HDFS-7543. Avoid path resolution when getting FileStatus for audit logs. 
Contributed by Haohui Mai. (wheat9: rev 
65f2a4ee600dfffa5203450261da3c1989de25a9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAclOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirMkdirOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Avoid path resolution when getting FileStatus for audit logs
 

 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7543.000.patch


 The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
 avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
 the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-3107:
--
Attachment: HDFS-3107.patch

Updating the patches once again.
This update is mostly related to [~jingzhao]'s refactoring of HDFS-7509.
There are no changes to the truncate logic itself.

Just to remind people here. The snapshot part of truncate is being maintained 
under HDFS-7056. And the combined patch for the two issues is also submitted 
there (per [~cmccabe]'s request).

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7056:
--
Status: Open  (was: Patch Available)

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7056:
--
Attachment: HDFS-7056.patch

Updating the patches once again. This is the snapshot part of truncate.
The update is mostly related to [~jingzhao]'s refactoring of HDFS-7509.
There are no changes to the truncate logic itself.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-7056:
---
Status: Patch Available  (was: Open)

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-7056:
---
Attachment: HDFS-3107-HDFS-7056-combined.patch

Attaching combined patch.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252156#comment-14252156
 ] 

Hadoop QA commented on HDFS-7056:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688105/HDFS-3107-HDFS-7056-combined.patch
  against trunk revision ef1fc51.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9077//console

This message is automatically generated.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252184#comment-14252184
 ] 

Hadoop QA commented on HDFS-7548:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688062/HDFS-7548.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9075//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9075//console

This message is automatically generated.

 Corrupt block reporting delayed until datablock scanner thread detects it
 -

 Key: HDFS-7548
 URL: https://issues.apache.org/jira/browse/HDFS-7548
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7548.patch


 When there is one datanode holding the block and that block happened to be
 corrupt, namenode would keep on trying to replicate the block repeatedly but 
 it would only report the block as corrupt only when the data block scanner 
 thread of the datanode picks up this bad block.
 Requesting improvement in namenode reporting so that corrupt replica would be 
 reported when there is only 1 replica and the replication of that replica 
 keeps on failing with the checksum error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-18 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252209#comment-14252209
 ] 

Charles Lamb commented on HDFS-7543:


Sorry I'm late to the party. Here are a few comments:

FSDirMkdirOp.java

in #mkdirs, you removed the final String srcArg = src. This should be left in. 
Many IDEs will whine about making assignments to formal args and that's why it 
was put in in the first place.

FSDirRenameOp.java

#renameToInt, dstIIP (and resultingStat) could benefit from final's.

FSDirXAttrOp.java

I'm not sure why you've moved the call to getINodesInPath4Write and 
checkXAttrChangeAccess inside the writeLock.

FSDirStatAndListing.java

The javadoc for the @param src needs to be changed to reflect that it's an 
INodesInPath, not a String. Nit: it might be better to rename the INodesInPath 
arg from src to iip.

#getFileInfo4DotSnapshot is now unused since you in-lined it into #getFileInfo.

 Avoid path resolution when getting FileStatus for audit logs
 

 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7543.000.patch


 The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
 avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
 the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it

2014-12-18 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252344#comment-14252344
 ] 

Rushabh S Shah commented on HDFS-7548:
--

The following tests are passing fine on my local setup on  both branch-2 and 
trunk: 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

There is already a jira filed for 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration test 
failure :  HDFS-7547





 Corrupt block reporting delayed until datablock scanner thread detects it
 -

 Key: HDFS-7548
 URL: https://issues.apache.org/jira/browse/HDFS-7548
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7548.patch


 When there is one datanode holding the block and that block happened to be
 corrupt, namenode would keep on trying to replicate the block repeatedly but 
 it would only report the block as corrupt only when the data block scanner 
 thread of the datanode picks up this bad block.
 Requesting improvement in namenode reporting so that corrupt replica would be 
 reported when there is only 1 replica and the replication of that replica 
 keeps on failing with the checksum error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252354#comment-14252354
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Looks like HDFS-7506 broke this again.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7443:
---
Attachment: HDFS-7443.001.patch

This patch changes the upgrade path from non-blockid-based-layout to 
blockid-based layout so that it uses the jdk7 {{Files#createLink}} function, 
instead of our hand-rolled hardlink code.

This avoids the dilemma of detecting EEXIST on all the various platforms that 
we hacked in support for in {{HardLink.java}}, such as Linux shell-based (no 
libhadoop.so case), cygwin, Windows native, and Linux JNI-based.  It might be 
possible to distinguish regular errors from EEXIST on all those platforms, but 
writing all that code would be a very big job.

I did not remove or alter any other code in {{HardLink.java}} in this patch.  I 
think clearly we should think about refactoring that code to use jdk7 later, 
but that is a bigger change that is not as critical as this fix.  We also can't 
get rid of {{HardLink.java}} completely because we are unfortunately depending 
on reading the hard link count of files in a few places-- something jdk7 does 
not support.

Another weird thing about the {{HardLink}} class is that all it actually 
contains is statistics information-- every important method is {{static}}.  So 
that's why we continue to use a {{HardLink}} instance in the upgrade code.   I 
think in the future, we should simply use the {{HardLink#Statistics}} class  
directly, since the outer class provides no value (it has only static methods).

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252373#comment-14252373
 ] 

Colin Patrick McCabe edited comment on HDFS-7443 at 12/18/14 10:02 PM:
---

Oh yeah, and this patch changes {{hadoop-24-datanode-dir.tgz}} to have a 
duplicate block (present in multiple subdirectories in a volume), so that we 
are exercising the collision-handling pathway in {{TestDatanodeLayoutUpgrade}}.


was (Author: cmccabe):
Oh yeah, and this patch changes {{hadoop-24-datanode-dir.tgz}} to have a 
duplicate block (present in multiple subdirectories in a volume), so that we 
are exercising the collision-handling pathway.

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252373#comment-14252373
 ] 

Colin Patrick McCabe commented on HDFS-7443:


Oh yeah, and this patch changes {{hadoop-24-datanode-dir.tgz}} to have a 
duplicate block (present in multiple subdirectories in a volume), so that we 
are exercising the collision-handling pathway.

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7443:
---
Target Version/s: 2.7.0  (was: 2.6.1)
  Status: Patch Available  (was: Open)

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing

2014-12-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7537:
---
Attachment: dfs-min-2.png

 dfs.namenode.replication.min  1  missing replicas  NN restart is 
 confusing
 ---

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Allen Wittenauer
 Attachments: dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7530) Allow renaming encryption zone roots

2014-12-18 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7530:
--
Summary: Allow renaming encryption zone roots  (was: Allow renaming an 
Encryption Zone root)

 Allow renaming encryption zone roots
 

 Key: HDFS-7530
 URL: https://issues.apache.org/jira/browse/HDFS-7530
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
 HDFS-7530.003.patch


 It should be possible to do
 hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252377#comment-14252377
 ] 

Allen Wittenauer commented on HDFS-7537:


Attached two screenshots that shows the confusion.

 dfs.namenode.replication.min  1  missing replicas  NN restart is 
 confusing
 ---

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Allen Wittenauer
 Attachments: dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7530) Allow renaming of encryption zone roots

2014-12-18 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7530:
--
Summary: Allow renaming of encryption zone roots  (was: Allow renaming 
encryption zone roots)

 Allow renaming of encryption zone roots
 ---

 Key: HDFS-7530
 URL: https://issues.apache.org/jira/browse/HDFS-7530
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
 HDFS-7530.003.patch


 It should be possible to do
 hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing

2014-12-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7537:
---
Attachment: dfs-min-2-fsck.png

 dfs.namenode.replication.min  1  missing replicas  NN restart is 
 confusing
 ---

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Allen Wittenauer
 Attachments: dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) dfs.namenode.replication.min 1 missing replicas NN restart is confusing

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252383#comment-14252383
 ] 

Allen Wittenauer commented on HDFS-7537:


Mock-up of an fsck that alerts when min rep hasn't actually been met:

{code}
Status: HEALTHY
 Total size:236 B
 Total dirs:1
 Total files:   1
 Total symlinks:0
 Total blocks (validated):  1 (avg. block size 236 B)
  
  UNDER MIN REPL'D BLOCKS:  1 (100.0 %)
  
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   1 (100.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 1.0
 Corrupt blocks:0
 Missing replicas:  2 (66.64 %)
 Number of data-nodes:  1
 Number of racks:   1
{code}

With all datanodes down (and therefore triggering corrupt/missing blocks):

{code}
Status: CORRUPT
 Total size:236 B
 Total dirs:1
 Total files:   1
 Total symlinks:0
 Total blocks (validated):  1 (avg. block size 236 B)
  
  UNDER MIN REPL'D BLOCKS:  1 (100.0 %)
  CORRUPT FILES:1
  MISSING BLOCKS:   1
  MISSING SIZE: 236 B
  CORRUPT BLOCKS:   1
  
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 0.0
 Corrupt blocks:1
 Missing replicas:  0
 Number of data-nodes:  0
 Number of racks:   0
FSCK ended at Thu Dec 18 14:08:25 PST 2014 in 13 milliseconds
{code}

 dfs.namenode.replication.min  1  missing replicas  NN restart is 
 confusing
 ---

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Allen Wittenauer
 Attachments: dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252395#comment-14252395
 ] 

Hadoop QA commented on HDFS-7538:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688080/hdfs-7538-001.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDecommission
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9076//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9076//console

This message is automatically generated.

 removedDst should be checked against null in the finally block of 
 FSDirRenameOp#unprotectedRenameTo()
 -

 Key: HDFS-7538
 URL: https://issues.apache.org/jira/browse/HDFS-7538
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-7538-001.patch


 {code}
 if (removedDst != null) {
   undoRemoveDst = false;
 ...
   if (undoRemoveDst) {
 // Rename failed - restore dst
 if (dstParent.isDirectory() 
 dstParent.asDirectory().isWithSnapshot()) {
   dstParent.asDirectory().undoRename4DstParent(removedDst,
 {code}
 If the first if check doesn't pass, removedDst would be null and 
 undoRemoveDst may be true.
 This combination would lead to NullPointerException in the finally block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7530) Allow renaming of encryption zone roots

2014-12-18 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7530:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Ran tests locally. TestDecom still failed but is true for me unrelated to this 
patch. Committed to trunk and branch-2, thanks for the contribution Charles.

 Allow renaming of encryption zone roots
 ---

 Key: HDFS-7530
 URL: https://issues.apache.org/jira/browse/HDFS-7530
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
 HDFS-7530.003.patch


 It should be possible to do
 hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming of encryption zone roots

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252407#comment-14252407
 ] 

Hudson commented on HDFS-7530:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6753 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6753/])
HDFS-7530. Allow renaming of encryption zone roots. Contributed by Charles 
Lamb. (wang: rev b0b9084433d5e80131429e6e76858b099deb2dda)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCryptoConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Allow renaming of encryption zone roots
 ---

 Key: HDFS-7530
 URL: https://issues.apache.org/jira/browse/HDFS-7530
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
 HDFS-7530.003.patch


 It should be possible to do
 hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7523) Setting a socket receive buffer size in DFSClient

2014-12-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-7523:

Attachment: HDFS-7523-001.txt

Retry. The failures look unrelated. Just making sure.

Change makes sense to me.  I see that the 'hint' DEFAULT_DATA_SOCKET_SIZE is 
passed elsewhere in the code base as receive size in datanode xceiver and 
domain peer service. It is also the send size in DFSOutputStream.  It not being 
set here in DFSClient looks like an oversight.  Nice one [~xieliang007]

I'll commit in next day or so unless objection.

 Setting a socket receive buffer size in DFSClient
 -

 Key: HDFS-7523
 URL: https://issues.apache.org/jira/browse/HDFS-7523
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-7523-001.txt, HDFS-7523-001.txt


 It would be nice if we have a socket receive buffer size while creating 
 socket from client(HBase) view, in old version it should be in 
 DFSInputStream, in trunk it seems should be at:
 {code}
   @Override // RemotePeerFactory
   public Peer newConnectedPeer(InetSocketAddress addr,
   TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId)
   throws IOException {
 Peer peer = null;
 boolean success = false;
 Socket sock = null;
 try {
   sock = socketFactory.createSocket();
   NetUtils.connect(sock, addr,
 getRandomLocalInterfaceAddr(),
 dfsClientConf.socketTimeout);
   peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this,
   blockToken, datanodeId);
   peer.setReadTimeout(dfsClientConf.socketTimeout);
 {code}
 e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
 the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, 
 this value sometimes is small for HBase 64k block reading in a 10G network(at 
 least, more system call)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252437#comment-14252437
 ] 

Arpit Agarwal commented on HDFS-7443:
-

+1 pending Jenkins. Thanks for ensuring it's covered in testing.

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7484:

Attachment: HDFS-7484.005.patch

Update the patch to fix bugs. In general the current patch also contains 
changes related to {{INodesInPath}} and {{addINode}}. I will separate them out 
into another jira.

 Simplify the workflow of calculating permission in mkdirs()
 ---

 Key: HDFS-7484
 URL: https://issues.apache.org/jira/browse/HDFS-7484
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Jing Zhao
 Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, 
 HDFS-7484.002.patch, HDFS-7484.003.patch, HDFS-7484.004.patch, 
 HDFS-7484.005.patch


 {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions 
 based on whether {{inheritPermission}} is true. This jira proposes to 
 simplify the workflow and make it explicit for the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252452#comment-14252452
 ] 

Arpit Agarwal commented on HDFS-7443:
-

Actually I just had a thought. I assumed that the excess copies would be hard 
links to the same physical file, perhaps due to a bug in the earlier LDir code. 
If these are distinct physical files, then should we retain the one with the 
largest on-disk size?

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252451#comment-14252451
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I actually quite like it. I think over the last few iterations the patch was 
polished enough and the test coverage is quite decent. 

+1

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7549) Add GenericTestUtils#disableLog, GenericTestUtils#setLogLevel

2014-12-18 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7549:
--

 Summary: Add GenericTestUtils#disableLog, 
GenericTestUtils#setLogLevel
 Key: HDFS-7549
 URL: https://issues.apache.org/jira/browse/HDFS-7549
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Now that we are using both commons-logging and slf4j, we can no longer rely on 
just casting the Log object to a {{Log4JLogger}} and calling {{setLevel}} on 
that.  With {{org.slf4j.Logger}} objects, we need to look up the underlying 
{{Log4JLogger}} using {{LogManager#getLogger}}.

This patch adds {{GenericTestUtils#disableLog}} and 
{{GenericTestUtils#setLogLevel}} functions which hide this complexity from unit 
tests, just allowing the tests to call {{disableLog}} or {{setLogLevel}}, and 
have {{GenericTestUtils}} figure out the right thing to do based on the log / 
logger type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-3107:
--
Attachment: HDFS-3107.patch

Incorporated latest trunk.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
 HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252476#comment-14252476
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

The diff between the two seems to be quite small, yet I guess it requires a 
formal review again. Hence +1 again.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
 HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252480#comment-14252480
 ] 

Colin Patrick McCabe commented on HDFS-7443:


When we observed this, they were not hard links, but separate copies.  They 
were identical (we ran a command-line checksum on them).  If possible, I would 
rather not start trying to pick the best one because I feel like 3x 
replication should ensure that we have redundancy in the system, and because 
the code would get a lot more complex.  Because we do the hardlinks in 
parallel, we would have to somehow accumulate the duplicates and deal with them 
at the end, once all worker threads had been joined.

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-18 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252478#comment-14252478
 ] 

Haohui Mai commented on HDFS-7543:
--

Thanks for the catch. Let's file a follow up jira to clean it up.

 Avoid path resolution when getting FileStatus for audit logs
 

 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7543.000.patch


 The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
 avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
 the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-7056:
---
Attachment: HDFS-3107-HDFS-7056-combined.patch

Attaching newly combined patch.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7431) log message for InvalidMagicNumberException may be incorrect

2014-12-18 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7431:

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 for the patch.  I committed it to trunk and branch-2.  Yi, thank you for the 
contribution.

 log message for InvalidMagicNumberException may be incorrect
 

 Key: HDFS-7431
 URL: https://issues.apache.org/jira/browse/HDFS-7431
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.7.0

 Attachments: HDFS-7431.001.patch, HDFS-7431.002.patch, 
 HDFS-7431.003.patch


 For security mode, HDFS now supports that Datanodes don't require root or 
 jsvc if {{dfs.data.transfer.protection}} is configured.
 Log message for {{InvalidMagicNumberException}}, we miss one case: 
 when the datanodes run on unprivileged port and 
 {{dfs.data.transfer.protection}} is configured to {{authentication}} but 
 {{dfs.encrypt.data.transfer}} is not configured. SASL handshake is required 
 and a low version dfs client is used, then {{InvalidMagicNumberException}} is 
 thrown and we write log:
 {quote}
 Failed to read expected encryption handshake from client at  Perhaps the 
 client is running an older version of Hadoop which does not support encryption
 {quote}
 Recently I run HDFS built on trunk and security is enabled, but the client is 
 2.5.1 version. Then I got the above log message, but actually I have not 
 configured encryption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252501#comment-14252501
 ] 

Colin Patrick McCabe commented on HDFS-7527:


Thanks for looking at this, [~decster] and [~wheat9].  It's a difficult and 
frustrating area of the code, in my opinion.

Unfortunately, I don't think this latest patch is exactly what we need.  Last 
time we proposed adding more DNS lookups in the {{DatanodeManager}}, the Yahoo 
guys said this was unacceptable from a performance point of view.  Caching DNS 
lookups, so that we didn't have to do them all the time, is a big part of what 
the {{HostFileManager}} was created to do.  [~daryn], [~eli], do you have any 
ideas here?

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang
 Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch


 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 

[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252512#comment-14252512
 ] 

Hadoop QA commented on HDFS-7056:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688173/HDFS-3107-HDFS-7056-combined.patch
  against trunk revision 5df7ecb.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9082//console

This message is automatically generated.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7431) log message for InvalidMagicNumberException may be incorrect

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252513#comment-14252513
 ] 

Hudson commented on HDFS-7431:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6754 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6754/])
HDFS-7431. log message for InvalidMagicNumberException may be incorrect. 
Contributed by Yi Liu. (cnauroth: rev 5df7ecb33ab24de903f0fd98e2a055164874def5)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java


 log message for InvalidMagicNumberException may be incorrect
 

 Key: HDFS-7431
 URL: https://issues.apache.org/jira/browse/HDFS-7431
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.7.0

 Attachments: HDFS-7431.001.patch, HDFS-7431.002.patch, 
 HDFS-7431.003.patch


 For security mode, HDFS now supports that Datanodes don't require root or 
 jsvc if {{dfs.data.transfer.protection}} is configured.
 Log message for {{InvalidMagicNumberException}}, we miss one case: 
 when the datanodes run on unprivileged port and 
 {{dfs.data.transfer.protection}} is configured to {{authentication}} but 
 {{dfs.encrypt.data.transfer}} is not configured. SASL handshake is required 
 and a low version dfs client is used, then {{InvalidMagicNumberException}} is 
 thrown and we write log:
 {quote}
 Failed to read expected encryption handshake from client at  Perhaps the 
 client is running an older version of Hadoop which does not support encryption
 {quote}
 Recently I run HDFS built on trunk and security is enabled, but the client is 
 2.5.1 version. Then I got the above log message, but actually I have not 
 configured encryption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7550) Minor followon cleanups from HDFS-7543

2014-12-18 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7550:
--

 Summary: Minor followon cleanups from HDFS-7543
 Key: HDFS-7550
 URL: https://issues.apache.org/jira/browse/HDFS-7550
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Priority: Minor


The commit of HDFS-7543 crossed paths with these comments:

FSDirMkdirOp.java

in #mkdirs, you removed the final String srcArg = src. This should be left in. 
Many IDEs will whine about making assignments to formal args and that's why it 
was put in in the first place.

FSDirRenameOp.java

#renameToInt, dstIIP (and resultingStat) could benefit from final's.

FSDirXAttrOp.java

I'm not sure why you've moved the call to getINodesInPath4Write and 
checkXAttrChangeAccess inside the writeLock.

FSDirStatAndListing.java

The javadoc for the @param src needs to be changed to reflect that it's an 
INodesInPath, not a String. Nit: it might be better to rename the INodesInPath 
arg from src to iip.

#getFileInfo4DotSnapshot is now unused since you in-lined it into #getFileInfo.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-18 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252532#comment-14252532
 ] 

Charles Lamb commented on HDFS-7543:


Thanks @wheat9. HDFS-7550.


 Avoid path resolution when getting FileStatus for audit logs
 

 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7543.000.patch


 The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
 avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
 the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7018) Implement C interface for libhdfs3

2014-12-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252541#comment-14252541
 ] 

Colin Patrick McCabe commented on HDFS-7018:


Please don't make another copy of {{hdfs.h}} in the source tree.  This will 
lead to the various copies getting out of sync over time, which would be very 
bad.  Instead, just reference the existing copy via a relative path.

You can add your {{hdfsGetLastError}} function to this file, and just have a 
dummy implementation that returns unknown error in all cases for 
{{libwebhdfs}} and {{libhdfs}}.  We can improve this in a follow-on JIRA.

{code}
39  #ifdef __cplusplus
40  extern C {
41  #endif
{code}
While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code.  
The C\+\+ linker is smart enough to figure out that the prototypes it is seeing 
correspond to the prototypes in the {{hdfs.h}} file you included.

{code}
45  static THREAD_LOCAL const char *ErrorMessage = NULL;
46  static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL;
47  static THREAD_LOCAL hdfs::internal::once_flag once;
48  
49  static void CreateMessageBuffer() {
50  ErrorMessageBuffer = new std::string;
51  }
{code}
I don't think we need all this.  Making the thread-local buffer a pointer to a 
{{std::string}} means that we have to check {{once_flag}} before we access it, 
which is inefficient.  It also means that if the thread exits, this memory will 
be leaked (unless you set up a POSIX thread destructor, which is complicated 
and platform-specific).

Instead, let's just have a char\[128\] buffer for each thread.  As an added 
bonus, because this utilitizes pre-allocation, it handles the case where you 
can't allocate memory for the error string itself, which you have said in the 
past that you care about.

{code}
158 private:
159 bool input;
160 void *stream;
161 };
{code}
Please don't use {{void*}} here.  It is not typesafe.  

You can clearly see that FS objects and file objects have a concrete type, 
spelled out in {{hdfs.h}}:
{code}
struct hdfs_internal;
typedef struct hdfs_internal* hdfsFS;

struct hdfsFile_internal;
typedef struct hdfsFile_internal* hdfsFile;
{code}

All of these functions need to have a {{catch (...)}} which sets the error 
message to unknown and returns {{EINTERNAL}}.  The reason is that if you 
attempt to throw a C\+\+ exception through a C API, the program will abort 
(technically, {{std::terminate}} will be called).  I realize you probably think 
you have caught all possible exceptions, but since this is C\+\+, we can never 
really be sure without the {{catch (...)}}

 Implement C interface for libhdfs3
 --

 Key: HDFS-7018
 URL: https://issues.apache.org/jira/browse/HDFS-7018
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch


 Implement C interface for libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7018) Implement C interface for libhdfs3

2014-12-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252541#comment-14252541
 ] 

Colin Patrick McCabe edited comment on HDFS-7018 at 12/18/14 11:31 PM:
---

Please don't make another copy of {{hdfs.h}} in the source tree.  This will 
lead to the various copies getting out of sync over time, which would be very 
bad.  Instead, just reference the existing copy via a relative path.

You can add your {{hdfsGetLastError}} function to this file, and just have a 
dummy implementation that returns unknown error in all cases for 
{{libwebhdfs}} and {{libhdfs}}.  We can improve this in a follow-on JIRA.

{code}
39  #ifdef __cplusplus
40  extern C {
41  #endif
{code}
While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code.  
The C\+\+ linker is smart enough to figure out that the prototypes it is seeing 
correspond to the prototypes in the {{hdfs.h}} file you included.

{code}
45  static THREAD_LOCAL const char *ErrorMessage = NULL;
46  static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL;
47  static THREAD_LOCAL hdfs::internal::once_flag once;
48  
49  static void CreateMessageBuffer() {
50  ErrorMessageBuffer = new std::string;
51  }
{code}
I don't think we need all this.  Making the thread-local buffer a pointer to a 
{{std::string}} means that we have to check {{once_flag}} before we access it, 
which is inefficient.  It also means that if the thread exits, this memory will 
be leaked (unless you set up a POSIX thread destructor, which is complicated 
and platform-specific).

Instead, let's just have a char\[128\] buffer for each thread.  As an added 
bonus, because this utilitizes pre-allocation, it handles the case where you 
can't allocate memory for the error string itself, which you have said in the 
past that you care about.

{code}
158 private:
159 bool input;
160 void *stream;
161 };
{code}
Please don't use {{void*}} here.  It is not typesafe.  

You can clearly see that FS objects and file objects have a concrete type, 
spelled out in {{hdfs.h}}:
{code}
struct hdfs_internal;
typedef struct hdfs_internal* hdfsFS;

struct hdfsFile_internal;
typedef struct hdfsFile_internal* hdfsFile;
{code}

All of these functions need to have a {{catch (...)}} which sets the error 
message to unknown and returns {{EINTERNAL}}.  The reason is that if you 
attempt to throw a C\+\+ exception through a C API, the program will abort 
(technically, {{std::terminate}} will be called).  I realize you probably think 
you have caught all possible exceptions, but since this is C\+\+, we can never 
really be sure without the {{catch (...)}}

P.S. thanks for working on this!


was (Author: cmccabe):
Please don't make another copy of {{hdfs.h}} in the source tree.  This will 
lead to the various copies getting out of sync over time, which would be very 
bad.  Instead, just reference the existing copy via a relative path.

You can add your {{hdfsGetLastError}} function to this file, and just have a 
dummy implementation that returns unknown error in all cases for 
{{libwebhdfs}} and {{libhdfs}}.  We can improve this in a follow-on JIRA.

{code}
39  #ifdef __cplusplus
40  extern C {
41  #endif
{code}
While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code.  
The C\+\+ linker is smart enough to figure out that the prototypes it is seeing 
correspond to the prototypes in the {{hdfs.h}} file you included.

{code}
45  static THREAD_LOCAL const char *ErrorMessage = NULL;
46  static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL;
47  static THREAD_LOCAL hdfs::internal::once_flag once;
48  
49  static void CreateMessageBuffer() {
50  ErrorMessageBuffer = new std::string;
51  }
{code}
I don't think we need all this.  Making the thread-local buffer a pointer to a 
{{std::string}} means that we have to check {{once_flag}} before we access it, 
which is inefficient.  It also means that if the thread exits, this memory will 
be leaked (unless you set up a POSIX thread destructor, which is complicated 
and platform-specific).

Instead, let's just have a char\[128\] buffer for each thread.  As an added 
bonus, because this utilitizes pre-allocation, it handles the case where you 
can't allocate memory for the error string itself, which you have said in the 
past that you care about.

{code}
158 private:
159 bool input;
160 void *stream;
161 };
{code}
Please don't use {{void*}} here.  It is not typesafe.  

You can clearly see that FS objects and file objects have a concrete type, 
spelled out in {{hdfs.h}}:
{code}
struct hdfs_internal;
typedef struct hdfs_internal* hdfsFS;

struct hdfsFile_internal;
typedef struct hdfsFile_internal* hdfsFile;
{code}

All of these functions 

[jira] [Commented] (HDFS-7182) JMX metrics aren't accessible when NN is busy

2014-12-18 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252553#comment-14252553
 ] 

Ming Ma commented on HDFS-7182:
---

Anyone else has suggestions on this? The patch has been running fine in one of 
our production clusters.

 JMX metrics aren't accessible when NN is busy
 -

 Key: HDFS-7182
 URL: https://issues.apache.org/jira/browse/HDFS-7182
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7182.patch


 HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple 
 new metrics have been added. It turns out RollingUpgradeStatus requires 
 FSNamesystem read lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7182) JMX metrics aren't accessible when NN is busy

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252561#comment-14252561
 ] 

Hadoop QA commented on HDFS-7182:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672609/HDFS-7182.patch
  against trunk revision 0402bad.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9083//console

This message is automatically generated.

 JMX metrics aren't accessible when NN is busy
 -

 Key: HDFS-7182
 URL: https://issues.apache.org/jira/browse/HDFS-7182
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7182.patch


 HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple 
 new metrics have been added. It turns out RollingUpgradeStatus requires 
 FSNamesystem read lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

2014-12-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252565#comment-14252565
 ] 

Arpit Agarwal commented on HDFS-7443:
-

bq. because the code would get a lot more complex. Because we do the hardlinks 
in parallel, we would have to somehow accumulate the duplicates and deal with 
them at the end, once all worker threads had been joined.
We wouldn't need all that. A length check on src and dst when we hit an 
exception should suffice right, depending on the result either discard src or 
overwrite dst? Anyway I think your patch is fine to go as it is.

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7443.001.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >