[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-02-18 Thread dan dan zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903896#comment-13903896
 ] 

dan dan zheng commented on HDFS-5892:
-

Here's a patch which addresses the issue. The cause of the intermittent failure 
is that the test tries to set name services in the configuration when starting 
the federation, but MiniDFSTopology generates the services ids without 
considering the name services set in the configuration. So the BPOfferServices 
started are actually for ns1 and ns2, not the ones set during the test 
("namesServerId1,namesServerId2"). Later on, the test refreshes the service 
using the id namesServerId2,  which starts the service for the first time. 
Also, ns1 and ns2 are not in the refresh list anymore, they are stopped. The 
test fails when namesServerId2 is not completely started and tries to create 
file /gamma, which is the reason we see the failure is intermittent due to the 
race condition. 

Refer to current log for issue,
2014-02-13 22:14:02,489 INFO  datanode.DataNode 
(BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for 
nameservices: ns1,ns2
2014-02-13 22:14:02,491 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for 
nameservices: ns1,ns2 
2014-02-13 22:51:40,326 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for 
nameservices: namesServerId2
2014-02-13 22:51:40,327 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for 
nameservices: ns1,ns2

After apply patch, MiniDFSTopology can get name service from configuration 
correctly, then BPOfferServices are started for correct nameservices.

Correct one should be,
2014-02-13 22:14:02,489 INFO  datanode.DataNode 
(BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for 
nameservices: namesServerId1,namesServerId2
2014-02-13 22:14:02,491 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for 
nameservices: namesServerId1,namesServerId2
2014-02-13 22:51:40,327 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for 
nameservices: namesServerId1



> TestDeleteBlockPool fails in branch-2
> -
>
> Key: HDFS-5892
> URL: https://issues.apache.org/jira/browse/HDFS-5892
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Attachments: 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-02-18 Thread dan dan zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dan dan zheng updated HDFS-5892:


Attachment: HDFS-5892.patch

> TestDeleteBlockPool fails in branch-2
> -
>
> Key: HDFS-5892
> URL: https://issues.apache.org/jira/browse/HDFS-5892
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Attachments: HDFS-5892.patch, 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5959) Fix typo at section name in FSImageFormatProtobuf.java

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903950#comment-13903950
 ] 

Hudson commented on HDFS-5959:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #485 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/485/])
HDFS-5959. Fix typo at section name in FSImageFormatProtobuf.java. Contributed 
by Akira Ajisaka. (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569156)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java


> Fix typo at section name in FSImageFormatProtobuf.java
> --
>
> Key: HDFS-5959
> URL: https://issues.apache.org/jira/browse/HDFS-5959
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
> Fix For: 2.4.0
>
> Attachments: HDFS-5959.patch
>
>
> There's a typo "REFRENCE"
> {code}
>   public enum SectionName {
> NS_INFO("NS_INFO"),
> STRING_TABLE("STRING_TABLE"),
> INODE("INODE"),
> INODE_REFRENCE("INODE_REFRENCE"),
> SNAPSHOT("SNAPSHOT"),
> {code}
> should be "REFERENCE".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5959) Fix typo at section name in FSImageFormatProtobuf.java

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903989#comment-13903989
 ] 

Hudson commented on HDFS-5959:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk #1677 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1677/])
HDFS-5959. Fix typo at section name in FSImageFormatProtobuf.java. Contributed 
by Akira Ajisaka. (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569156)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java


> Fix typo at section name in FSImageFormatProtobuf.java
> --
>
> Key: HDFS-5959
> URL: https://issues.apache.org/jira/browse/HDFS-5959
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
> Fix For: 2.4.0
>
> Attachments: HDFS-5959.patch
>
>
> There's a typo "REFRENCE"
> {code}
>   public enum SectionName {
> NS_INFO("NS_INFO"),
> STRING_TABLE("STRING_TABLE"),
> INODE("INODE"),
> INODE_REFRENCE("INODE_REFRENCE"),
> SNAPSHOT("SNAPSHOT"),
> {code}
> should be "REFERENCE".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5529) { Disk Fail } Can we shutdown the DN when it meet's disk failed condition

2014-02-18 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved HDFS-5529.


Resolution: Duplicate


Closing since it will be handled as part of the HDFS-2882


> { Disk Fail } Can we shutdown the DN when it meet's disk failed condition
> -
>
> Key: HDFS-5529
> URL: https://issues.apache.org/jira/browse/HDFS-5529
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>
> Scenario :
> 
> had configured the two dir's for the datanode
> One dir is not having the permissions,Hence is throwing following exception 
> and getting NPE while sending the heartbeat..
> {noformat}
> 2013-11-19 17:35:26,599 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-994471486-10.18.40.21-1384754500555 (storage id 
> DS-1184111760-10.18.40.38-50010-1384862726499) service to 
> HOST-10-18-91-26/10.18.40.21:8020
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed 
> volumes - current valid volumes: 1, volumes configured: 2, volumes failed: 1, 
> volume failures tolerated: 0
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:202)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:966)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:928)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:285)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-19 17:35:26,602 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool 
> BP-994471486-10.18.40.21-1384754500555 (storage id 
> DS-1184111760-10.18.40.38-50010-1384862726499) service to 
> HOST-10-18-91-26/10.18.40.21:8020
> 2013-11-19 17:35:26,602 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool BP-994471486-10.18.40.21-1384754500555 (storage id 
> DS-1184111760-10.18.40.38-50010-1384862726499) service to 
> linux-hadoop/10.18.40.14:8020 beginning handshake with NN
> 2013-11-19 17:35:26,648 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool Block pool BP-994471486-10.18.40.21-1384754500555 (storage id 
> DS-1184111760-10.18.40.38-50010-1384862726499) service to 
> linux-hadoop/10.18.40.14:8020 successfully registered with NN
> 2013-11-19 17:35:26,648 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> For namenode linux-hadoop/10.18.40.14:8020 using DELETEREPORT_INTERVAL of 
> 30 msec  BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; 
> heartBeatInterval=3000
> 2013-11-19 17:35:26,649 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
> for Block pool BP-994471486-10.18.40.21-1384754500555 (storage id 
> DS-1184111760-10.18.40.38-50010-1384862726499) service to 
> linux-hadoop/10.18.40.14:8020
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:439)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-02-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5892:
-

Status: Patch Available  (was: Open)

> TestDeleteBlockPool fails in branch-2
> -
>
> Key: HDFS-5892
> URL: https://issues.apache.org/jira/browse/HDFS-5892
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Attachments: HDFS-5892.patch, 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904104#comment-13904104
 ] 

Kihwal Lee commented on HDFS-5780:
--

+1 

> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5959) Fix typo at section name in FSImageFormatProtobuf.java

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904109#comment-13904109
 ] 

Hudson commented on HDFS-5959:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1702 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1702/])
HDFS-5959. Fix typo at section name in FSImageFormatProtobuf.java. Contributed 
by Akira Ajisaka. (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569156)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java


> Fix typo at section name in FSImageFormatProtobuf.java
> --
>
> Key: HDFS-5959
> URL: https://issues.apache.org/jira/browse/HDFS-5959
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: newbie
> Fix For: 2.4.0
>
> Attachments: HDFS-5959.patch
>
>
> There's a typo "REFRENCE"
> {code}
>   public enum SectionName {
> NS_INFO("NS_INFO"),
> STRING_TABLE("STRING_TABLE"),
> INODE("INODE"),
> INODE_REFRENCE("INODE_REFRENCE"),
> SNAPSHOT("SNAPSHOT"),
> {code}
> should be "REFERENCE".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5780:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks for working on the issue, Mit. Thanks 
for the review, Arpit.

> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2014-02-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904121#comment-13904121
 ] 

Kihwal Lee commented on HDFS-5225:
--

[~lars_francke], what is the version of Hadoop you are using?

> datanode keeps logging the same 'is no longer in the dataset' message over 
> and over again
> -
>
> Key: HDFS-5225
> URL: https://issues.apache.org/jira/browse/HDFS-5225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.1.1-beta
>Reporter: Roman Shaposhnik
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, 
> HDFS-5225.2.patch
>
>
> I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
> configuration: 4 nodes fully distributed cluster with security on.
> All of a sudden my DN ate up all of the space in /var/log logging the 
> following message repeatedly:
> {noformat}
> 2013-09-18 20:51:12,046 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
> in the dataset
> {noformat}
> It wouldn't answer to a jstack and jstack -F ended up being useless.
> Here's what I was able to find in the NameNode logs regarding this block ID:
> {noformat}
> fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
> 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: 
> /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
>  BP-1884637155-10.224.158.152-1379524544853 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.224.158.152:1004 is added to 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.34.74.206:1004 is added to 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.83.107.80:1004 is added to 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,899 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
>  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
> 10.34.74.206:1004, 10.224.158.152:1004], 
> clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:17,904 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
>  successfully to 
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
> 2013-09-18 18:03:18,540 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
>  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
> 10.34.74.206:1004, 10.224.158.152:1004], 
> clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:18,548 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
>  successfully to 
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
> 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
> blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
> 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
> blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
> blk_1073742188_1368, blk_1073742189_1371]
> 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1

[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-02-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904124#comment-13904124
 ] 

Hadoop QA commented on HDFS-5892:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629497/HDFS-5892.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6167//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6167//console

This message is automatically generated.

> TestDeleteBlockPool fails in branch-2
> -
>
> Key: HDFS-5892
> URL: https://issues.apache.org/jira/browse/HDFS-5892
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Attachments: HDFS-5892.patch, 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904127#comment-13904127
 ] 

Hudson commented on HDFS-5780:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5180 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5180/])
HDFS-5780. TestRBWBlockInvalidation times out intemittently.  Contributed by 
Mit Desai. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569368)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java


> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5225) datanode keeps logging the same 'is no longer in the dataset' message over and over again

2014-02-18 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904144#comment-13904144
 ] 

Lars Francke commented on HDFS-5225:


We're running CDH 4.5.0 which is using Hadoop 2.0. I see that a fix for this 
issue is in CDH 4.6 but that's not released yet.

> datanode keeps logging the same 'is no longer in the dataset' message over 
> and over again
> -
>
> Key: HDFS-5225
> URL: https://issues.apache.org/jira/browse/HDFS-5225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.1.1-beta
>Reporter: Roman Shaposhnik
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch, 
> HDFS-5225.2.patch
>
>
> I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
> configuration: 4 nodes fully distributed cluster with security on.
> All of a sudden my DN ate up all of the space in /var/log logging the 
> following message repeatedly:
> {noformat}
> 2013-09-18 20:51:12,046 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer 
> in the dataset
> {noformat}
> It wouldn't answer to a jstack and jstack -F ended up being useless.
> Here's what I was able to find in the NameNode logs regarding this block ID:
> {noformat}
> fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
> 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: 
> /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
>  BP-1884637155-10.224.158.152-1379524544853 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.224.158.152:1004 is added to 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.34.74.206:1004 is added to 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.83.107.80:1004 is added to 
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
> ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,899 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
>  newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
> 10.34.74.206:1004, 10.224.158.152:1004], 
> clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:17,904 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
>  successfully to 
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
> 2013-09-18 18:03:18,540 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
>  newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
> 10.34.74.206:1004, 10.224.158.152:1004], 
> clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:18,548 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
>  successfully to 
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
> 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
> blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
> 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
> blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
> blk_1073742188_1368, blk_1073742189_1371]
> 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange:

[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails

2014-02-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904157#comment-13904157
 ] 

Kihwal Lee commented on HDFS-5803:
--

I just wanted make sure that the test timeout is not due to regression in the 
core code.  It looks like the trunk version has 3 extra test cases and 
{{testExitZeroOnSuccess}} accounts for most of the extra execution time. I did 
not see any sign of performance regression in the three common ones.

+1 for the patch.

> TestBalancer.testBalancer0 fails
> 
>
> Key: HDFS-5803
> URL: https://issues.apache.org/jira/browse/HDFS-5803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Chen He
> Attachments: HDFS-5803.patch
>
>
> The test testBalancer0 fails on branch 2. Below is the stack trace
> {noformat}
> java.util.concurrent.TimeoutException: Cluster failed to reached expected 
> values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
> 280, expected: 300), in more than 2 msec.
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5803) TestBalancer.testBalancer0 fails

2014-02-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5803:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for working on this, Chen.

> TestBalancer.testBalancer0 fails
> 
>
> Key: HDFS-5803
> URL: https://issues.apache.org/jira/browse/HDFS-5803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Chen He
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5803.patch
>
>
> The test testBalancer0 fails on branch 2. Below is the stack trace
> {noformat}
> java.util.concurrent.TimeoutException: Cluster failed to reached expected 
> values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
> 280, expected: 300), in more than 2 msec.
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5803) TestBalancer.testBalancer0 fails

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904167#comment-13904167
 ] 

Hudson commented on HDFS-5803:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5182 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5182/])
HDFS-5803. TestBalancer.testBalancer0 fails. Contributed by Chen He. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569391)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


> TestBalancer.testBalancer0 fails
> 
>
> Key: HDFS-5803
> URL: https://issues.apache.org/jira/browse/HDFS-5803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Chen He
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5803.patch
>
>
> The test testBalancer0 fails on branch 2. Below is the stack trace
> {noformat}
> java.util.concurrent.TimeoutException: Cluster failed to reached expected 
> values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
> 280, expected: 300), in more than 2 msec.
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:245)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:375)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.twoNodeTest(TestBalancer.java:404)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0Internal(TestBalancer.java:448)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:442)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904244#comment-13904244
 ] 

Arpit Agarwal commented on HDFS-5318:
-

@Eric Sirianni could you post a rebased patch? I reviewed this today and the 
changes look mostly fine.

A couple of questions:
# It looks like read-only storages don't get returned to clients for read. Is 
this intentional?
# It would be nice to have an additional test to verify corrupt blocks on 
read-only storages don't get counted towards corrupt blocks.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318-trunk.patch, HDFS-5318.patch, 
> HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
> HDFS-5318c-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904247#comment-13904247
 ] 

Arpit Agarwal commented on HDFS-5318:
-

Tag [~sirianni]

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318-trunk.patch, HDFS-5318.patch, 
> HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
> HDFS-5318c-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause SecondaryNameNode failed do checkpoint

2014-02-18 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904249#comment-13904249
 ] 

Benoy Antony commented on HDFS-5944:


Good job in finding and fixing this bug, [~zhaoyunjiong]. 
Would there be multiple trailing "/"  ? If so, removing the last one character 
may not be enough.

> LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right 
> cause SecondaryNameNode failed do checkpoint
> -
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2014-02-18 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904364#comment-13904364
 ] 

Suresh Srinivas commented on HDFS-5958:
---

[~kovyrin], can you please any logs you may have for this issue?

> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates

2014-02-18 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904403#comment-13904403
 ] 

Jing Zhao commented on HDFS-5893:
-

+1. I will commit the patch shortly.

> HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory 
> which does not import SSL certificates
> 
>
> Key: HDFS-5893
> URL: https://issues.apache.org/jira/browse/HDFS-5893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Attachments: HDFS-5893.000.patch
>
>
> When {{HftpFileSystem}} tries to get the data, it create a 
> {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. 
> However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default 
> URLConnectionFactory. It does not import the SSL certificates from 
> ssl-client.xml. Therefore {{HsftpFileSystem}} fails.
> To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the 
> same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option

2014-02-18 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904416#comment-13904416
 ] 

Haohui Mai commented on HDFS-5956:
--

The patch mostly looks good. Some minor comments:

{code}
+long maxFileSize = 0;
+for (FileStatus fs : writtenFiles.values()) {
+  maxFileSize = Math.max(maxFileSize, fs.getLen());
+}
{code}

You can use {{Collections.max}} instead.

nit: can you change the name of the test (i.e., 
{{testFileDistributionVisitor}}) in this patch as well?

> A file size is multiplied by the replication factor in 'hdfs oiv -p 
> FileDistribution' option
> 
>
> Key: HDFS-5956
> URL: https://issues.apache.org/jira/browse/HDFS-5956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-5956.patch
>
>
> In FileDistributionCalculator.java, 
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes() * f.getReplication();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize;
> {code}
> should be
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize * f.getReplication();
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5893:


   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-2 and branch-2.4.

> HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory 
> which does not import SSL certificates
> 
>
> Key: HDFS-5893
> URL: https://issues.apache.org/jira/browse/HDFS-5893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5893.000.patch
>
>
> When {{HftpFileSystem}} tries to get the data, it create a 
> {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. 
> However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default 
> URLConnectionFactory. It does not import the SSL certificates from 
> ssl-client.xml. Therefore {{HsftpFileSystem}} fails.
> To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the 
> same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5893) HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904429#comment-13904429
 ] 

Hudson commented on HDFS-5893:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5184 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5184/])
HDFS-5893. HftpFileSystem.RangeHeaderUrlOpener uses the default 
URLConnectionFactory which does not import SSL certificates. Contributed by 
Haohui Mai. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569477)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHttpsFileSystem.java


> HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory 
> which does not import SSL certificates
> 
>
> Key: HDFS-5893
> URL: https://issues.apache.org/jira/browse/HDFS-5893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5893.000.patch
>
>
> When {{HftpFileSystem}} tries to get the data, it create a 
> {{RangeHeaderUrlOpener}} object to open a HTTP / HTTPS connection to the NN. 
> However, {{HftpFileSystem.RangeHeaderUrlOpener}} uses the default 
> URLConnectionFactory. It does not import the SSL certificates from 
> ssl-client.xml. Therefore {{HsftpFileSystem}} fails.
> To fix this bug, {{HftpFileSystem.RangeHeaderUrlOpener}} needs to use the 
> same {{URLConnectionFactory}} as the one used by {{HftpFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2014-02-18 Thread Alexey Kovyrin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904430#comment-13904430
 ] 

Alexey Kovyrin commented on HDFS-5958:
--

[~sureshms], here is a piece of my log from the balancer: 
https://gist.github.com/kovyrin/9077741/raw/a30429b213fc4a5faca40f96c54f01d52c60706e/gistfile1.txt

Here is a screenshot with all the nodes in the cluster: 
http://snap.kovyrin.net/Hadoop_NameNode%C2%A0ops01.dal05.swiftype.net_8020-20140218-141308.jpg

name to address map:
{code}
10.84.56.2work01
10.60.120.8   work02
10.84.56.10   work03
10.84.56.12   logs01
10.80.72.204  backup01
{code}


> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-02-18 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904434#comment-13904434
 ] 

Haohui Mai commented on HDFS-5952:
--

Is it okay to use the XML-based tool for debugging? Otherwise you'll end up 
with duplicating the code in {{PBImageXmlWriter}} to parse the fsimage.

Note that the XML / delimited formats are intended to capture all internal 
details of the fsimage. I understand that the delimited format is more compact 
than the XML one. The delimited format does not include a schema thus it could 
be problematic when the format of fsimage changes. Unfortunately we changes the 
fsimage format quite often. :-(

If you really want to output in delimited format, I think it might be easier to 
take the output of {{PBImageXmlWriter}} and to use SAX to convert the XML into 
the delimited format. It should work fairly efficiently.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-18 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904460#comment-13904460
 ] 

Haohui Mai commented on HDFS-5939:
--

{code}
@@ -712,6 +712,9 @@ private Node chooseRandom(String scope, String 
excludedScope){
 numOfDatanodes -= ((InnerNode)node).getNumOfLeaves();
   }
 }
+if (numOfDatanodes == 0) {
+  return null;
+}
 int leaveIndex = r.nextInt(numOfDatanodes);
 return innerNode.getLeaf(leaveIndex, node);
   }
{code}

This changes affect a couple downstream callers. For example, 
{{BlockPlacementByDefault}}. I think we need to file a separate jira for this 
change so that the callers are aware of the fact that the function can return 
{{null}}.

> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5945) Add rolling upgrade information to fsimage

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5945:


Attachment: HDFS-5945.protobuf.patch

Make a small change to the latest patch to use protobuf based fsimage.

> Add rolling upgrade information to fsimage
> --
>
> Key: HDFS-5945
> URL: https://issues.apache.org/jira/browse/HDFS-5945
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, 
> h5945_20140214.patch, h5945_20140216.patch
>
>
> When rolling upgrade is in progress, the standby namenode may create 
> checkpoint.  The rolling upgrade information should be added to fsimage in 
> order to support namenode restart and continue rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option

2014-02-18 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5956:


Attachment: HDFS-5956.2.patch

Thanks for your review, [~wheat9].
Attaching a patch to reflect your comments.

> A file size is multiplied by the replication factor in 'hdfs oiv -p 
> FileDistribution' option
> 
>
> Key: HDFS-5956
> URL: https://issues.apache.org/jira/browse/HDFS-5956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-5956.2.patch, HDFS-5956.patch
>
>
> In FileDistributionCalculator.java, 
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes() * f.getReplication();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize;
> {code}
> should be
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize * f.getReplication();
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option

2014-02-18 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5956:


 Target Version/s: 2.4.0
Affects Version/s: 2.4.0

> A file size is multiplied by the replication factor in 'hdfs oiv -p 
> FileDistribution' option
> 
>
> Key: HDFS-5956
> URL: https://issues.apache.org/jira/browse/HDFS-5956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-5956.2.patch, HDFS-5956.patch
>
>
> In FileDistributionCalculator.java, 
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes() * f.getReplication();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize;
> {code}
> should be
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize * f.getReplication();
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-18 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904492#comment-13904492
 ] 

Eric Sirianni commented on HDFS-5318:
-

bq. It looks like read-only storages don't get returned to clients for read. Is 
this intentional?
Can you elaborate?  As far as I can see read-only storages _are_ returned to 
clients for read.  Also, the {{TestReadOnlySharedStorage}} JUnit validates that 
{{client.getLocatedBlocks()}} returns the read-only locations in addition to 
the normal ones.

bq. It would be nice to have an additional test to verify corrupt blocks on 
read-only storages don't get counted towards corrupt blocks.
I will look into adding this test case to {{TestReadOnlySharedStorage}}.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318-trunk.patch, HDFS-5318.patch, 
> HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
> HDFS-5318c-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5945) Add rolling upgrade information to fsimage

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5945:
-

Hadoop Flags: Reviewed

+1 the protobuf change looks good.

> Add rolling upgrade information to fsimage
> --
>
> Key: HDFS-5945
> URL: https://issues.apache.org/jira/browse/HDFS-5945
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, 
> h5945_20140214.patch, h5945_20140216.patch
>
>
> When rolling upgrade is in progress, the standby namenode may create 
> checkpoint.  The rolling upgrade information should be added to fsimage in 
> order to support namenode restart and continue rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5945) Add rolling upgrade information to fsimage

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5945:
-


I have committed this.

> Add rolling upgrade information to fsimage
> --
>
> Key: HDFS-5945
> URL: https://issues.apache.org/jira/browse/HDFS-5945
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, 
> h5945_20140214.patch, h5945_20140216.patch
>
>
> When rolling upgrade is in progress, the standby namenode may create 
> checkpoint.  The rolling upgrade information should be added to fsimage in 
> order to support namenode restart and continue rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5945) Add rolling upgrade information to fsimage

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5945.
--

   Resolution: Fixed
Fix Version/s: HDFS-5535 (Rolling upgrades)

> Add rolling upgrade information to fsimage
> --
>
> Key: HDFS-5945
> URL: https://issues.apache.org/jira/browse/HDFS-5945
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5945.protobuf.patch, h5945_20140213.patch, 
> h5945_20140214.patch, h5945_20140216.patch
>
>
> When rolling upgrade is in progress, the standby namenode may create 
> checkpoint.  The rolling upgrade information should be added to fsimage in 
> order to support namenode restart and continue rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5905) Upgrade and rolling upgrade should not be allowed simultaneously

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5905.
--

Resolution: Duplicate

This was fixed by HDFS-5945.

> Upgrade and rolling upgrade should not be allowed simultaneously
> 
>
> Key: HDFS-5905
> URL: https://issues.apache.org/jira/browse/HDFS-5905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> The existing upgrade/finalize mechanism and the new rolling upgrade mechanism 
> are two distinct features for upgrading the HDFS software.  They cannot be 
> executed simultaneously.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE reassigned HDFS-5778:


Assignee: Tsz Wo (Nicholas), SZE

> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904568#comment-13904568
 ] 

Arpit Agarwal commented on HDFS-5889:
-

This patch seems to have broken the rolling upgrade tests.

The new edit log ops {{OP_ROLLING_UPGRADE_START}} and 
{{OP_ROLLING_UPGRADE_FINALIZE}} trigger a {{RollingUpgradeException}} during NN 
restart. I think the fix should be to invoke 
{{startRollingUpgrade/finalizeRollingUpgrade}} (and write to editLog only when 
invoked via RPC).

I filed HDFS-5960.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch, h5889_20140213.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5960:
---

 Summary: Fix TestRollingUpgrade
 Key: HDFS-5960
 URL: https://issues.apache.org/jira/browse/HDFS-5960
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: HDFS-5535 (Rolling upgrades)


{{TestRollingUpgrade}} fails when restarting the NN because 
{{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
expected.

The fix is to start/finalize rolling upgrade when the corresponding edit log op 
is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5960 started by Arpit Agarwal.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-18 Thread Eric Sirianni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5318:


Attachment: HDFS-5318-trunkb.patch

Updated patch based on Arpit's feedback.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, 
> HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
> HDFS-5318c-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5960:


Attachment: HDFS-5960.01.patch

Patch to process {{start/finalizeRollingUpgrade}} from the corresponding edit 
log operations. Also update edit log only when the operations are initiated via 
RPC.

Verified this fixes {{TestRollingUpgrade}}. {{TestEditLogUpgradeMarker}} still 
needs to be updated.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-18 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904631#comment-13904631
 ] 

Jing Zhao commented on HDFS-5920:
-

I've committed this.

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, 
> HDFS-5920.001.patch, HDFS-5920.002.patch, HDFS-5920.003.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.
> Currently the proposed rollback for rolling upgrade is:
> 1. Shutdown both NN
> 2. Start one of the NN using "-rollingUpgrade rollback" option
> 3. This NN will load the special fsimage right before the upgrade marker, 
> then discard all the editlog segments after the txid of the fsimage
> 4. The NN will also send RPC requests to all the JNs to discard editlog 
> segments. This call expects response from all the JNs. The NN will keep 
> running if the call succeeds.
> 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade 
> rollback"



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5920.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, 
> HDFS-5920.001.patch, HDFS-5920.002.patch, HDFS-5920.003.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.
> Currently the proposed rollback for rolling upgrade is:
> 1. Shutdown both NN
> 2. Start one of the NN using "-rollingUpgrade rollback" option
> 3. This NN will load the special fsimage right before the upgrade marker, 
> then discard all the editlog segments after the txid of the fsimage
> 4. The NN will also send RPC requests to all the JNs to discard editlog 
> segments. This call expects response from all the JNs. The NN will keep 
> running if the call succeeds.
> 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade 
> rollback"



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904644#comment-13904644
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5960:
--

Arpit, thanks for fixing the test!  For the patch, let's refactor the 
startRollingUpgrade to startRollingUpgrade and startRollingUpgradeInternal 
(like startFile and startFileInternal) for rpc and edit log processing.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904679#comment-13904679
 ] 

Arpit Agarwal commented on HDFS-5318:
-

You're right.

+1 pending Jenkins.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, 
> HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
> HDFS-5318c-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5956) A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option

2014-02-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904686#comment-13904686
 ] 

Hadoop QA commented on HDFS-5956:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629608/HDFS-5956.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6168//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6168//console

This message is automatically generated.

> A file size is multiplied by the replication factor in 'hdfs oiv -p 
> FileDistribution' option
> 
>
> Key: HDFS-5956
> URL: https://issues.apache.org/jira/browse/HDFS-5956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-5956.2.patch, HDFS-5956.patch
>
>
> In FileDistributionCalculator.java, 
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes() * f.getReplication();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize;
> {code}
> should be
> {code}
> long fileSize = 0;
> for (BlockProto b : f.getBlocksList()) {
>   fileSize += b.getNumBytes();
> }
> maxFileSize = Math.max(fileSize, maxFileSize);
> totalSpace += fileSize * f.getReplication();
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-18 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904698#comment-13904698
 ] 

Brandon Li commented on HDFS-5583:
--

Sure. I will review it.

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link

2014-02-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904701#comment-13904701
 ] 

Kihwal Lee commented on HDFS-5961:
--

I have verified that adding {{processPermission()}} to the symlink INode 
loading fixes the issue.

> OIV cannot load fsimages containing a symbolic link
> ---
>
> Key: HDFS-5961
> URL: https://issues.apache.org/jira/browse/HDFS-5961
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
>
> In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
> symlink INodes. So after incorrectly reading in the first symbolic link , the 
> next INode can't be read.
> HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5961) OIV cannot load fsimages containing a symbolic link

2014-02-18 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-5961:


 Summary: OIV cannot load fsimages containing a symbolic link
 Key: HDFS-5961
 URL: https://issues.apache.org/jira/browse/HDFS-5961
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical


In {{ImageLoaderCurrent#processINode}}, the permission is not read for symlink 
INodes. So after incorrectly reading in the first symbolic link , the next 
INode can't be read.

HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5961) OIV cannot load fsimages containing a symbolic link

2014-02-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5961:
-

Attachment: HDFS-5961.patch

> OIV cannot load fsimages containing a symbolic link
> ---
>
> Key: HDFS-5961
> URL: https://issues.apache.org/jira/browse/HDFS-5961
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5961.patch
>
>
> In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
> symlink INodes. So after incorrectly reading in the first symbolic link , the 
> next INode can't be read.
> HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5961) OIV cannot load fsimages containing a symbolic link

2014-02-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5961:
-

Status: Patch Available  (was: Open)

> OIV cannot load fsimages containing a symbolic link
> ---
>
> Key: HDFS-5961
> URL: https://issues.apache.org/jira/browse/HDFS-5961
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5961.patch
>
>
> In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
> symlink INodes. So after incorrectly reading in the first symbolic link , the 
> next INode can't be read.
> HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5962) Mtime and atime are not persisted for symbolic links

2014-02-18 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-5962:


 Summary: Mtime and atime are not persisted for symbolic links
 Key: HDFS-5962
 URL: https://issues.apache.org/jira/browse/HDFS-5962
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical


In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
hardcoded to be 0 when saving to fsimage, even though they are recorded in 
memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5962) Mtime is not persisted for symbolic links

2014-02-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5962:
-

Summary: Mtime is not persisted for symbolic links  (was: Mtime and atime 
are not persisted for symbolic links)

> Mtime is not persisted for symbolic links
> -
>
> Key: HDFS-5962
> URL: https://issues.apache.org/jira/browse/HDFS-5962
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
>
> In {{FSImageSerialization}}, the mtime and atime of symbolic links are 
> hardcoded to be 0 when saving to fsimage, even though they are recorded in 
> memory and shown in the listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5962) Mtime is not persisted for symbolic links

2014-02-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5962:
-

Description: In {{FSImageSerialization}}, the mtime symbolic links is 
hardcoded to be 0 when saving to fsimage, even though it is recorded in memory 
and shown in the listing until restarting namenode.  (was: In 
{{FSImageSerialization}}, the mtime and atime of symbolic links are hardcoded 
to be 0 when saving to fsimage, even though they are recorded in memory and 
shown in the listing until restarting namenode.)

> Mtime is not persisted for symbolic links
> -
>
> Key: HDFS-5962
> URL: https://issues.apache.org/jira/browse/HDFS-5962
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
>
> In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 
> when saving to fsimage, even though it is recorded in memory and shown in the 
> listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904759#comment-13904759
 ] 

Arpit Agarwal commented on HDFS-5960:
-

Unfortunately the branch is being actively changed while broken so that nature 
of the failure seems to have changed since the last patch.

I think we need to hold off on checkins till the branch is fixed.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab

2014-02-18 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HDFS-5898:
--

Attachment: HDFS-5898-with-documentation.patch

Added documentation. This now won't require a separate doc patch.

> Allow NFS gateway to login/relogin from its kerberos keytab
> ---
>
> Key: HDFS-5898
> URL: https://issues.apache.org/jira/browse/HDFS-5898
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Jing Zhao
>Assignee: Abin Shahab
> Attachments: HDFS-5898-documentation.patch, 
> HDFS-5898-documentation.patch, HDFS-5898-with-documentation.patch, 
> HDFS-5898.patch, HDFS-5898.patch, HDFS-5898.patch
>
>
> According to the discussion in HDFS-5804:
> 1. The NFS gateway should be able to get it's own tgts, and renew them.
> 2. We should update the HdfsNfsGateway.apt.vm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5960:


Attachment: HDFS-5960.02.patch

Thanks for taking a look Nicholas. Updated patch with your feedback.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.

2014-02-18 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904784#comment-13904784
 ] 

Haohui Mai commented on HDFS-5796:
--

HDFS-5716 allows pluggable authentication mechanism in WebHDFS which provides a 
solution to this problem. Is it okay to mark this bug as a duplicate of 
HDFS-5716?

> The file system browser in the namenode UI requires SPNEGO.
> ---
>
> Key: HDFS-5796
> URL: https://issues.apache.org/jira/browse/HDFS-5796
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
>
> After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
> SPNEGO to work between user's browser and namenode.  This won't work if the 
> cluster's security infrastructure is isolated from the regular network.  
> Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5960:


Summary: Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands  (was: 
Fix TestRollingUpgrade)

> Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands
> -
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail

2014-02-18 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5963:
---

 Summary: TestRollingUpgrade#testSecondaryNameNode causes 
subsequent tests to fail
 Key: HDFS-5963
 URL: https://issues.apache.org/jira/browse/HDFS-5963
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Arpit Agarwal


{{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
It seems to be caused by the terminate hook used by the test but I did not 
spend much time on it. Commenting out this test case makes other tests in the 
same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5964) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail

2014-02-18 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5964:
---

 Summary: TestRollingUpgrade#testSecondaryNameNode causes 
subsequent tests to fail
 Key: HDFS-5964
 URL: https://issues.apache.org/jira/browse/HDFS-5964
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS-5535 (Rolling upgrades)
Reporter: Arpit Agarwal


{{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
It seems to be caused by the terminate hook used by the test but I did not 
spend much time on it. Commenting out this test case makes other tests in the 
same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5963:


Description: {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent 
tests to fail. It seems to be caused by the terminate hook used by the test. 
Commenting out this test case makes other tests in the same class pass.  (was: 
{{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
It seems to be caused by the terminate hook used by the test but I did not 
spend much time on it. Commenting out this test case makes other tests in the 
same class pass.)

> TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
> 
>
> Key: HDFS-5963
> URL: https://issues.apache.org/jira/browse/HDFS-5963
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Arpit Agarwal
>
> {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
> It seems to be caused by the terminate hook used by the test. Commenting out 
> this test case makes other tests in the same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5964) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5964:


Issue Type: Bug  (was: Sub-task)
Parent: (was: HDFS-5535)

> TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
> 
>
> Key: HDFS-5964
> URL: https://issues.apache.org/jira/browse/HDFS-5964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Arpit Agarwal
>
> {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
> It seems to be caused by the terminate hook used by the test but I did not 
> spend much time on it. Commenting out this test case makes other tests in the 
> same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5964) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-5964.
-

Resolution: Duplicate

Dup of HDFS-5963.

> TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
> 
>
> Key: HDFS-5964
> URL: https://issues.apache.org/jira/browse/HDFS-5964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Arpit Agarwal
>
> {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
> It seems to be caused by the terminate hook used by the test but I did not 
> spend much time on it. Commenting out this test case makes other tests in the 
> same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-02-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904844#comment-13904844
 ] 

Hadoop QA commented on HDFS-5318:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12629623/HDFS-5318-trunkb.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6169//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6169//console

This message is automatically generated.

> Support read-only and read-write paths to shared replicas
> -
>
> Key: HDFS-5318
> URL: https://issues.apache.org/jira/browse/HDFS-5318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Eric Sirianni
> Attachments: HDFS-5318-trunk.patch, HDFS-5318-trunkb.patch, 
> HDFS-5318.patch, HDFS-5318a-branch-2.patch, HDFS-5318b-branch-2.patch, 
> HDFS-5318c-branch-2.patch, hdfs-5318.pdf
>
>
> There are several use cases for using shared-storage for datanode block 
> storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
> S3, etc.).
> With shared-storage, there is a distinction between:
> # a distinct physical copy of a block
> # an access-path to that block via a datanode.  
> A single 'replication count' metric cannot accurately capture both aspects.  
> However, for most of the current uses of 'replication count' in the Namenode, 
> the "number of physical copies" aspect seems to be the appropriate semantic.
> I propose altering the replication counting algorithm in the Namenode to 
> accurately infer distinct physical copies in a shared storage environment.  
> With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
> additional semantics to the {{StorageID}} - namely that multiple datanodes 
> attaching to the same physical shared storage pool should report the same 
> {{StorageID}} for that pool.  A minor modification would be required in the 
> DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
> the {{FsDatasetSpi}} interface.  
> With those semantics in place, the number of physical copies of a block in a 
> shared storage environment can be calculated as the number of _distinct_ 
> {{StorageID}} s associated with that block.
> Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
> pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
> * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* 
> physical replicas (i.e. the traditional HDFS case with local disks)
> ** → Block B has {{ReplicationCount == 2}}
> * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* 
> physical replica (e.g. HDFS datanodes mounting the same NAS share)
> ** → Block B has {{ReplicationCount == 1}}
> For example, if block B has the following location tuples:
> * {{DN_1, STORAGE_A}}
> * {{DN_2, STORAGE_A}}
> * {{DN_3, STORAGE_B}}
> * {{DN_4, STORAGE_B}},
> the effect of this proposed change would be to calculate the replication 
> factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5962) Mtime is not persisted for symbolic links

2014-02-18 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned HDFS-5962:
---

Assignee: Akira AJISAKA

> Mtime is not persisted for symbolic links
> -
>
> Key: HDFS-5962
> URL: https://issues.apache.org/jira/browse/HDFS-5962
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Akira AJISAKA
>Priority: Critical
>
> In {{FSImageSerialization}}, the mtime symbolic links is hardcoded to be 0 
> when saving to fsimage, even though it is recorded in memory and shown in the 
> listing until restarting namenode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904870#comment-13904870
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5960:
--

Hi Arpit, does testDFSAdminRollingUpgradeCommands fail in your machine?  I just 
have tried and it did not fail.

> Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands
> -
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5965) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5965:
---

 Summary: caller of NetworkTopology's chooseRandom method to be 
expect null return value
 Key: HDFS-5965
 URL: https://issues.apache.org/jira/browse/HDFS-5965
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Priority: Minor


Class NetworkTopology's method
   public Node chooseRandom(String scope) 
calls 
   private Node chooseRandom(String scope, String excludedScope)

which may return null value.

Caller of this method such as BlockPlacementPolicyDefault etc need to be aware 
that.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5968) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-18 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5968:
---

 Summary: Fix rollback of rolling upgrade in NameNode HA setup
 Key: HDFS-5968
 URL: https://issues.apache.org/jira/browse/HDFS-5968
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


This jira does the following:
1. When do rollback for rolling upgrade, we should call 
FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in 
HA setup).
2. After the rollback, we also need to rename the md5 file and change its 
reference file name.
3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5967) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5967:
---

 Summary: caller of NetworkTopology's chooseRandom method to be 
expect null return value
 Key: HDFS-5967
 URL: https://issues.apache.org/jira/browse/HDFS-5967
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Priority: Minor


Class NetworkTopology's method
   public Node chooseRandom(String scope) 
calls 
   private Node chooseRandom(String scope, String excludedScope)

which may return null value.

Caller of this method such as BlockPlacementPolicyDefault etc need to be aware 
that.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-18 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5966:
---

 Summary: Fix rollback of rolling upgrade in NameNode HA setup
 Key: HDFS-5966
 URL: https://issues.apache.org/jira/browse/HDFS-5966
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


This jira does the following:
1. When do rollback for rolling upgrade, we should call 
FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade in 
HA setup).
2. After the rollback, we also need to rename the md5 file and change its 
reference file name.
3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5966:


Attachment: HDFS-5966.000.patch

> Fix rollback of rolling upgrade in NameNode HA setup
> 
>
> Key: HDFS-5966
> URL: https://issues.apache.org/jira/browse/HDFS-5966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5966.000.patch
>
>
> This jira does the following:
> 1. When do rollback for rolling upgrade, we should call 
> FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
> in HA setup).
> 2. After the rollback, we also need to rename the md5 file and change its 
> reference file name.
> 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5969) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5969:
---

 Summary: caller of NetworkTopology's chooseRandom method to be 
expect null return value
 Key: HDFS-5969
 URL: https://issues.apache.org/jira/browse/HDFS-5969
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Yongjun Zhang
Priority: Minor


Class NetworkTopology's method
   public Node chooseRandom(String scope) 
calls 
   private Node chooseRandom(String scope, String excludedScope)

which may return null value.

Caller of this method such as BlockPlacementPolicyDefault etc need to be aware 
that.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5966:


Attachment: (was: HDFS-5966.000.patch)

> Fix rollback of rolling upgrade in NameNode HA setup
> 
>
> Key: HDFS-5966
> URL: https://issues.apache.org/jira/browse/HDFS-5966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> This jira does the following:
> 1. When do rollback for rolling upgrade, we should call 
> FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
> in HA setup).
> 2. After the rollback, we also need to rename the md5 file and change its 
> reference file name.
> 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904891#comment-13904891
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5958:
--

The balancing policy assumes that there are enough blocks for moving around.  
In your case, it may be impossible to satisfy the percentage threshold 
requirement for the large the datanode since it remains under utilized even if 
it has a replica for all the blocks.

> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built

2014-02-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5953:
---

Summary: TestBlockReaderFactory fails if libhadoop.so has not been built  
(was: TestBlockReaderFactory fails in trunk)

> TestBlockReaderFactory fails if libhadoop.so has not been built
> ---
>
> Key: HDFS-5953
> URL: https://issues.apache.org/jira/browse/HDFS-5953
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Akira AJISAKA
> Fix For: 2.4.0
>
> Attachments: HDFS-5953.patch
>
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
>  :
> {code}
> java.lang.RuntimeException: Although a UNIX domain socket path is configured 
> as 
> /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
>  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
>   at 
> org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
> {code}
> This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5970) callers of NetworkTopology's chooseRandom method to expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5970:
---

 Summary: callers of NetworkTopology's chooseRandom method to 
expect null return value
 Key: HDFS-5970
 URL: https://issues.apache.org/jira/browse/HDFS-5970
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 3.0.0
Reporter: Yongjun Zhang


Class NetworkTopology's method
   public Node chooseRandom(String scope) 
calls 
   private Node chooseRandom(String scope, String excludedScope)

which may return null value.

Callers of this method such as BlockPlacementPolicyDefault etc need to be aware 
that.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5968) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5968.
-

Resolution: Duplicate

Created the same jira twice because of some network issue.

> Fix rollback of rolling upgrade in NameNode HA setup
> 
>
> Key: HDFS-5968
> URL: https://issues.apache.org/jira/browse/HDFS-5968
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> This jira does the following:
> 1. When do rollback for rolling upgrade, we should call 
> FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
> in HA setup).
> 2. After the rollback, we also need to rename the md5 file and change its 
> reference file name.
> 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5971) callers of NetworkTopology's chooseRandom method to expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5971:
---

 Summary: callers of NetworkTopology's chooseRandom method to 
expect null return value
 Key: HDFS-5971
 URL: https://issues.apache.org/jira/browse/HDFS-5971
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Yongjun Zhang


Class NetworkTopology's method
   public Node chooseRandom(String scope) 
calls 
   private Node chooseRandom(String scope, String excludedScope)

which may return null value.

Callers of this method such as BlockPlacementPolicyDefault etc need to be aware 
that.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904905#comment-13904905
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5958:
--

I think we might need a new balancing policy for such special cases.

> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5966) Fix rollback of rolling upgrade in NameNode HA setup

2014-02-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5966:


Attachment: HDFS-5966.000.patch

> Fix rollback of rolling upgrade in NameNode HA setup
> 
>
> Key: HDFS-5966
> URL: https://issues.apache.org/jira/browse/HDFS-5966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5966.000.patch
>
>
> This jira does the following:
> 1. When do rollback for rolling upgrade, we should call 
> FSEditLog#initJournalsForWrite when initializing editLog (just like Upgrade 
> in HA setup).
> 2. After the rollback, we also need to rename the md5 file and change its 
> reference file name.
> 3. Add a new unit test to cover rollback with HA+QJM



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904901#comment-13904901
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5960:
--

> Unfortunately the branch is being actively changed while broken so that ... 

Sorry about that.  I indeed plan to fix the tests after the feature 
implementation is complete.  (This is also a reason that we create the feature 
branch.  BTW, the feature in the NN side is complete now and I am also fixing 
the tests.)  It is hard (and unnecessary) to keep the tests passing while the 
feature is incomplete.

> Fix TestRollingUpgrade#testDFSAdminRollingUpgradeCommands
> -
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2014-02-18 Thread Alexey Kovyrin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904904#comment-13904904
 ] 

Alexey Kovyrin commented on HDFS-5958:
--

I understand perfectly well why it is happening. I've reported the issue to 
make sure it will be fixed and other users wouldn't need to spend hours pulling 
their hair out trying to figure out what is going on with their balance 
processes hanging forever, promising to move data around and not doing it. 

> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5972) callers of NetworkTopology's chooseRandom method to expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5972:
---

 Summary: callers of NetworkTopology's chooseRandom method to 
expect null return value
 Key: HDFS-5972
 URL: https://issues.apache.org/jira/browse/HDFS-5972
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang


Class NetworkTopology's method
   public Node chooseRandom(String scope) 
calls 
   private Node chooseRandom(String scope, String excludedScope)

which may return null value.

Callers of this method such as BlockPlacementPolicyDefault etc need to be aware 
that.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5960:


Summary: Fix TestRollingUpgrade  (was: Fix 
TestRollingUpgrade#testDFSAdminRollingUpgradeCommands)

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link

2014-02-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904907#comment-13904907
 ] 

Hadoop QA commented on HDFS-5961:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629658/HDFS-5961.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6170//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6170//console

This message is automatically generated.

> OIV cannot load fsimages containing a symbolic link
> ---
>
> Key: HDFS-5961
> URL: https://issues.apache.org/jira/browse/HDFS-5961
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5961.patch
>
>
> In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
> symlink INodes. So after incorrectly reading in the first symbolic link , the 
> next INode can't be read.
> HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2014-02-18 Thread Alexey Kovyrin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904908#comment-13904908
 ] 

Alexey Kovyrin commented on HDFS-5958:
--

Why not fix the default ones? Current behavior is clearly is bug, the balancer 
lies to a user's face by promising to move data around only to *silently* fail 
to do it and make another promise it could not keep.

> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904909#comment-13904909
 ] 

Arpit Agarwal commented on HDFS-5960:
-

Thanks Nicholas, you are right that {{testDFSAdminRollingUpgradeCommands}} no 
longer fails, I've re-edited the title.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2014-02-18 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904914#comment-13904914
 ] 

Akira AJISAKA commented on HDFS-3570:
-

Thank you for verifying, [~ash211]!
[~qwertymaniac], would you please review the patch?

> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> 
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Akira AJISAKA
>Priority: Minor
> Attachments: HDFS-3570.2.patch, HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5965) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-5965.
-

Resolution: Duplicate

> caller of NetworkTopology's chooseRandom method to be expect null return value
> --
>
> Key: HDFS-5965
> URL: https://issues.apache.org/jira/browse/HDFS-5965
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Minor
>
> Class NetworkTopology's method
>public Node chooseRandom(String scope) 
> calls 
>private Node chooseRandom(String scope, String excludedScope)
> which may return null value.
> Caller of this method such as BlockPlacementPolicyDefault etc need to be 
> aware that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built

2014-02-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904920#comment-13904920
 ] 

Colin Patrick McCabe commented on HDFS-5953:


Thanks, guys.  I added {{Drequire.test.libhadoop}} to the nightly build, to 
ensure we catch failures to build libhadoop.so

> TestBlockReaderFactory fails if libhadoop.so has not been built
> ---
>
> Key: HDFS-5953
> URL: https://issues.apache.org/jira/browse/HDFS-5953
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Akira AJISAKA
> Fix For: 2.4.0
>
> Attachments: HDFS-5953.patch
>
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
>  :
> {code}
> java.lang.RuntimeException: Although a UNIX domain socket path is configured 
> as 
> /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
>  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
>   at 
> org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
> {code}
> This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5961) OIV cannot load fsimages containing a symbolic link

2014-02-18 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904919#comment-13904919
 ] 

Jing Zhao commented on HDFS-5961:
-

+1 the patch looks good to me. Thanks for the fix [~kihwal]!

> OIV cannot load fsimages containing a symbolic link
> ---
>
> Key: HDFS-5961
> URL: https://issues.apache.org/jira/browse/HDFS-5961
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5961.patch
>
>
> In {{ImageLoaderCurrent#processINode}}, the permission is not read for 
> symlink INodes. So after incorrectly reading in the first symbolic link , the 
> next INode can't be read.
> HDFS-4850 broke this while fixing other issues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built

2014-02-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904924#comment-13904924
 ] 

Colin Patrick McCabe commented on HDFS-5953:


Thanks, guys.  I added {{Drequire.test.libhadoop}} to the nightly build, to 
ensure we catch failures to build libhadoop.so

> TestBlockReaderFactory fails if libhadoop.so has not been built
> ---
>
> Key: HDFS-5953
> URL: https://issues.apache.org/jira/browse/HDFS-5953
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Akira AJISAKA
> Fix For: 2.4.0
>
> Attachments: HDFS-5953.patch
>
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
>  :
> {code}
> java.lang.RuntimeException: Although a UNIX domain socket path is configured 
> as 
> /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
>  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
>   at 
> org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
> {code}
> This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5960) Fix TestRollingUpgrade

2014-02-18 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-5960.
-

Resolution: Not A Problem

Cannot repro this failure anymore, fixed HDFS-5963 for a separate bug.

> Fix TestRollingUpgrade
> --
>
> Key: HDFS-5960
> URL: https://issues.apache.org/jira/browse/HDFS-5960
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5960.01.patch, HDFS-5960.02.patch
>
>
> {{TestRollingUpgrade}} fails when restarting the NN because 
> {{OP_ROLLING_UPGRADE_START}} and {{OP_ROLLING_UPGRADE_FINALIZE}} are not 
> expected.
> The fix is to start/finalize rolling upgrade when the corresponding edit log 
> op is seen.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

2014-02-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904932#comment-13904932
 ] 

Colin Patrick McCabe commented on HDFS-5957:


bq. This usage pattern in combination with zero-copy read causes retention of a 
large number of memory-mapped regions in the ShortCircuitCache. Eventually, 
YARN's resource check kills the container process for exceeding the enforced 
physical memory bounds.

mmap regions don't consume physical memory.  They do consume virtual memory.

I don't think limiting virtual memory usage is a particularly helpful policy, 
and YARN should stop doing that if that is in fact what it is doing.

bq. As a workaround, I advised Gopal to downtune 
dfs.client.mmap.cache.timeout.ms to make the munmap happen more quickly. A 
better solution would be to provide support in the HDFS client for a caching 
policy that fits this usage pattern.

In our tests, mmap provided no performance advantage unless it was reused.  If 
Gopal needs to purge mmaps immediately after using them, the correct thing is 
simply not to use zero-copy reads.

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> -
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5953) TestBlockReaderFactory fails if libhadoop.so has not been built

2014-02-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904937#comment-13904937
 ] 

Hudson commented on HDFS-5953:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5186 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5186/])
Update change description for HDFS-5953 (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1569579)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> TestBlockReaderFactory fails if libhadoop.so has not been built
> ---
>
> Key: HDFS-5953
> URL: https://issues.apache.org/jira/browse/HDFS-5953
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Akira AJISAKA
> Fix For: 2.4.0
>
> Attachments: HDFS-5953.patch
>
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
>  :
> {code}
> java.lang.RuntimeException: Although a UNIX domain socket path is configured 
> as 
> /tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
>  we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
>   at 
> org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
> {code}
> This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5965) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904943#comment-13904943
 ] 

Yongjun Zhang commented on HDFS-5965:
-

Accidentally created multiple jiras for the same issue, due to the incorrect 
response of JIRA gui today.

> caller of NetworkTopology's chooseRandom method to be expect null return value
> --
>
> Key: HDFS-5965
> URL: https://issues.apache.org/jira/browse/HDFS-5965
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Minor
>
> Class NetworkTopology's method
>public Node chooseRandom(String scope) 
> calls 
>private Node chooseRandom(String scope, String excludedScope)
> which may return null value.
> Caller of this method such as BlockPlacementPolicyDefault etc need to be 
> aware that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5967) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904948#comment-13904948
 ] 

Yongjun Zhang commented on HDFS-5967:
-

Accidentally created multiple jiras for the same issue, due to the unexpected 
response of JIRA gui today.

> caller of NetworkTopology's chooseRandom method to be expect null return value
> --
>
> Key: HDFS-5967
> URL: https://issues.apache.org/jira/browse/HDFS-5967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Minor
>
> Class NetworkTopology's method
>public Node chooseRandom(String scope) 
> calls 
>private Node chooseRandom(String scope, String excludedScope)
> which may return null value.
> Caller of this method such as BlockPlacementPolicyDefault etc need to be 
> aware that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5963) TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail

2014-02-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE reassigned HDFS-5963:


Assignee: Tsz Wo (Nicholas), SZE

> TestRollingUpgrade#testSecondaryNameNode causes subsequent tests to fail
> 
>
> Key: HDFS-5963
> URL: https://issues.apache.org/jira/browse/HDFS-5963
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Arpit Agarwal
>Assignee: Tsz Wo (Nicholas), SZE
>
> {{TestRollingUpgrade#testSecondaryNameNode}} causes subsequent tests to fail. 
> It seems to be caused by the terminate hook used by the test. Commenting out 
> this test case makes other tests in the same class pass.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5967) caller of NetworkTopology's chooseRandom method to be expect null return value

2014-02-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904949#comment-13904949
 ] 

Yongjun Zhang commented on HDFS-5967:
-

Accidentally created multiple jiras for the same issue, due to the unexpected 
response of JIRA gui today.

> caller of NetworkTopology's chooseRandom method to be expect null return value
> --
>
> Key: HDFS-5967
> URL: https://issues.apache.org/jira/browse/HDFS-5967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Minor
>
> Class NetworkTopology's method
>public Node chooseRandom(String scope) 
> calls 
>private Node chooseRandom(String scope, String excludedScope)
> which may return null value.
> Caller of this method such as BlockPlacementPolicyDefault etc need to be 
> aware that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >