[jira] [Commented] (HDFS-7240) Object store in HDFS

2014-10-16 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173423#comment-14173423
 ] 

Edward Bortnikov commented on HDFS-7240:


Very interested to follow. How is this related to the previous jira and design 
on Block-Management-as-a-Service (HDFS-5477)? 

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey

 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7252) small refinement to the use of isInAnEZ in FSNamesystem

2014-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173530#comment-14173530
 ] 

Hadoop QA commented on HDFS-7252:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675225/HDFS-7252.002.patch
  against trunk revision 2894433.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8439//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8439//console

This message is automatically generated.

 small refinement to the use of isInAnEZ in FSNamesystem
 ---

 Key: HDFS-7252
 URL: https://issues.apache.org/jira/browse/HDFS-7252
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Trivial
 Attachments: HDFS-7252.001.patch, HDFS-7252.002.patch


 In {{FSN#startFileInt}}, _EncryptionZoneManager#getEncryptionZoneForPath_ is 
 invoked 3 times (_dir.isInAnEZ(iip)_, _dir.getEZForPath(iip)_, 
 _dir.getKeyName(iip)_) in following code, actually we just need one.
 {code}
 if (dir.isInAnEZ(iip)) {
   EncryptionZone zone = dir.getEZForPath(iip);
   protocolVersion = chooseProtocolVersion(zone, supportedVersions);
   suite = zone.getSuite();
   ezKeyName = dir.getKeyName(iip);
   Preconditions.checkNotNull(protocolVersion);
   Preconditions.checkNotNull(suite);
   Preconditions.checkArgument(!suite.equals(CipherSuite.UNKNOWN),
   Chose an UNKNOWN CipherSuite!);
   Preconditions.checkNotNull(ezKeyName);
 }
 {code}
 Also there are 2 times in following code, but just need one
 {code}
 if (dir.isInAnEZ(iip)) {
   // The path is now within an EZ, but we're missing encryption parameters
   if (suite == null || edek == null) {
 throw new RetryStartFileException();
   }
   // Path is within an EZ and we have provided encryption parameters.
   // Make sure that the generated EDEK matches the settings of the EZ.
   String ezKeyName = dir.getKeyName(iip);
   if (!ezKeyName.equals(edek.getEncryptionKeyName())) {
 throw new RetryStartFileException();
   }
   feInfo = new FileEncryptionInfo(suite, version,
   edek.getEncryptedKeyVersion().getMaterial(),
   edek.getEncryptedKeyIv(),
   ezKeyName, edek.getEncryptionKeyVersionName());
   Preconditions.checkNotNull(feInfo);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7208) NN doesn't schedule replication when a DN storage fails

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173639#comment-14173639
 ] 

Hudson commented on HDFS-7208:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #713 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/713/])
HDFS-7208. NN doesn't schedule replication when a DN storage fails.  
Contributed by Ming Ma (szetszwo: rev 41980c56d3c01d7a0ddc7deea2d89b7f28026722)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java


 NN doesn't schedule replication when a DN storage fails
 ---

 Key: HDFS-7208
 URL: https://issues.apache.org/jira/browse/HDFS-7208
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-7208-2.patch, HDFS-7208-3.patch, HDFS-7208.patch


 We found the following problem. When a storage device on a DN fails, NN 
 continues to believe replicas of those blocks on that storage are valid and 
 doesn't schedule replication.
 A DN has 12 storage disks. So there is one blockReport for each storage. When 
 a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given 
 dfs.datanode.failed.volumes.tolerated is configured to be  0, NN still 
 considers that DN healthy.
 1. A disk failed. All blocks of that disk are removed from DN dataset.
  
 {noformat}
 2014-10-04 02:11:12,626 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing 
 replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume 
 /data/disk6/dfs/current
 {noformat}
 2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN 
 remove the DN and the replicas from the BlocksMap. In addition, blockReport 
 doesn't provide the diff given that is done per storage.
 {noformat}
 2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
 Disk error on DatanodeRegistration(xx.xx.xx.xxx, 
 datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075, 
 ipcPort=50020, 
 storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939):
  DataNode failed volumes:/data/disk6/dfs/current
 {noformat}
 3. Run fsck on the file and confirm the NN's BlocksMap still has that replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7185) The active NameNode will not accept an fsimage sent from the standby during rolling upgrade

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173641#comment-14173641
 ] 

Hudson commented on HDFS-7185:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #713 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/713/])
HDFS-7185. The active NameNode will not accept an fsimage sent from the standby 
during rolling upgrade. Contributed by Jing Zhao. (jing9: rev 
18620649f96d9e378fb7ea40de216284a9d525c7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 The active NameNode will not accept an fsimage sent from the standby during 
 rolling upgrade
 ---

 Key: HDFS-7185
 URL: https://issues.apache.org/jira/browse/HDFS-7185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7185.000.patch, HDFS-7185.001.patch, 
 HDFS-7185.002.patch, HDFS-7185.003.patch, HDFS-7185.004.patch


 The active NameNode will not accept an fsimage sent from the standby during 
 rolling upgrade.  The active fails with the exception:
 {code}
 18:25:07,620  WARN ImageServlet:198 - Received an invalid request file 
 transfer request from a secondary with storage info 
 -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
 18:25:07,620  WARN log:76 - Committed before 410 PutImage failed. 
 java.io.IOException: This namenode has storage info 
 -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
 expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
 0a6e431987f6
 at 
 org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
 at 
 org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
 {code}
 On the standby, the exception is:
 {code}
 java.io.IOException: Exception during image upload: 
 org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
  This namenode has storage info 
 -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
 expected
  -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
 at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
 at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
 {code}
 This seems to be a consequence of the fact that the VERSION file still is at 
 -55 (the old version) even after the rolling upgrade has started.  When the 
 rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}}, 
 both VERSION files get set to the new version, and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5089) When a LayoutVersion support SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION.

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173642#comment-14173642
 ] 

Hudson commented on HDFS-5089:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #713 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/713/])
HDFS-5089. When a LayoutVersion support SNAPSHOT, it must support 
FSIMAGE_NAME_OPTIMIZATION. (szetszwo: rev 
289442a242259af53dc73a156aa523e3e6c7)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestLayoutVersion.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java


 When a LayoutVersion support SNAPSHOT, it must support 
 FSIMAGE_NAME_OPTIMIZATION.
 -

 Key: HDFS-5089
 URL: https://issues.apache.org/jira/browse/HDFS-5089
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: 2.6.0

 Attachments: h5089_20130813.patch, h5089_20140325.patch


 The SNAPSHOT layout requires FSIMAGE_NAME_OPTIMIZATION as a prerequisite.  
 However, RESERVED_REL1_3_0 supports SNAPSHOT but not 
 FSIMAGE_NAME_OPTIMIZATION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7243) HDFS concat operation should not be allowed in Encryption Zone

2014-10-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7243:
---
Attachment: HDFS-7243.002.patch

Thanks [~hitliuyi]. You're right. I've posted the updated patch.


 HDFS concat operation should not be allowed in Encryption Zone
 --

 Key: HDFS-7243
 URL: https://issues.apache.org/jira/browse/HDFS-7243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Charles Lamb
 Attachments: HDFS-7243.001.patch, HDFS-7243.002.patch, 
 HDFS-7243.002.patch


 For HDFS encryption at rest, files in an encryption zone are using different 
 data encryption keys, so concat should be disallowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7243) HDFS concat operation should not be allowed in Encryption Zone

2014-10-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7243:
---
Attachment: (was: HDFS-7243.002.patch)

 HDFS concat operation should not be allowed in Encryption Zone
 --

 Key: HDFS-7243
 URL: https://issues.apache.org/jira/browse/HDFS-7243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Charles Lamb
 Attachments: HDFS-7243.001.patch, HDFS-7243.002.patch


 For HDFS encryption at rest, files in an encryption zone are using different 
 data encryption keys, so concat should be disallowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7243) HDFS concat operation should not be allowed in Encryption Zone

2014-10-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7243:
---
Attachment: HDFS-7243.003.patch

 HDFS concat operation should not be allowed in Encryption Zone
 --

 Key: HDFS-7243
 URL: https://issues.apache.org/jira/browse/HDFS-7243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Charles Lamb
 Attachments: HDFS-7243.001.patch, HDFS-7243.002.patch, 
 HDFS-7243.003.patch


 For HDFS encryption at rest, files in an encryption zone are using different 
 data encryption keys, so concat should be disallowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7243) HDFS concat operation should not be allowed in Encryption Zone

2014-10-16 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173690#comment-14173690
 ] 

Yi Liu commented on HDFS-7243:
--

Thanks Charles for updating the patch, it looks good to me.

 HDFS concat operation should not be allowed in Encryption Zone
 --

 Key: HDFS-7243
 URL: https://issues.apache.org/jira/browse/HDFS-7243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Charles Lamb
 Attachments: HDFS-7243.001.patch, HDFS-7243.002.patch, 
 HDFS-7243.003.patch


 For HDFS encryption at rest, files in an encryption zone are using different 
 data encryption keys, so concat should be disallowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7243) HDFS concat operation should not be allowed in Encryption Zone

2014-10-16 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173695#comment-14173695
 ] 

Charles Lamb commented on HDFS-7243:


Thanks for reviewing Yi.



 HDFS concat operation should not be allowed in Encryption Zone
 --

 Key: HDFS-7243
 URL: https://issues.apache.org/jira/browse/HDFS-7243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Charles Lamb
 Attachments: HDFS-7243.001.patch, HDFS-7243.002.patch, 
 HDFS-7243.003.patch


 For HDFS encryption at rest, files in an encryption zone are using different 
 data encryption keys, so concat should be disallowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7185) The active NameNode will not accept an fsimage sent from the standby during rolling upgrade

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173761#comment-14173761
 ] 

Hudson commented on HDFS-7185:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1903 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1903/])
HDFS-7185. The active NameNode will not accept an fsimage sent from the standby 
during rolling upgrade. Contributed by Jing Zhao. (jing9: rev 
18620649f96d9e378fb7ea40de216284a9d525c7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java


 The active NameNode will not accept an fsimage sent from the standby during 
 rolling upgrade
 ---

 Key: HDFS-7185
 URL: https://issues.apache.org/jira/browse/HDFS-7185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7185.000.patch, HDFS-7185.001.patch, 
 HDFS-7185.002.patch, HDFS-7185.003.patch, HDFS-7185.004.patch


 The active NameNode will not accept an fsimage sent from the standby during 
 rolling upgrade.  The active fails with the exception:
 {code}
 18:25:07,620  WARN ImageServlet:198 - Received an invalid request file 
 transfer request from a secondary with storage info 
 -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
 18:25:07,620  WARN log:76 - Committed before 410 PutImage failed. 
 java.io.IOException: This namenode has storage info 
 -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
 expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
 0a6e431987f6
 at 
 org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
 at 
 org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
 {code}
 On the standby, the exception is:
 {code}
 java.io.IOException: Exception during image upload: 
 org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
  This namenode has storage info 
 -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
 expected
  -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
 at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
 at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
 {code}
 This seems to be a consequence of the fact that the VERSION file still is at 
 -55 (the old version) even after the rolling upgrade has started.  When the 
 rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}}, 
 both VERSION files get set to the new version, and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7208) NN doesn't schedule replication when a DN storage fails

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173759#comment-14173759
 ] 

Hudson commented on HDFS-7208:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1903 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1903/])
HDFS-7208. NN doesn't schedule replication when a DN storage fails.  
Contributed by Ming Ma (szetszwo: rev 41980c56d3c01d7a0ddc7deea2d89b7f28026722)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


 NN doesn't schedule replication when a DN storage fails
 ---

 Key: HDFS-7208
 URL: https://issues.apache.org/jira/browse/HDFS-7208
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-7208-2.patch, HDFS-7208-3.patch, HDFS-7208.patch


 We found the following problem. When a storage device on a DN fails, NN 
 continues to believe replicas of those blocks on that storage are valid and 
 doesn't schedule replication.
 A DN has 12 storage disks. So there is one blockReport for each storage. When 
 a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given 
 dfs.datanode.failed.volumes.tolerated is configured to be  0, NN still 
 considers that DN healthy.
 1. A disk failed. All blocks of that disk are removed from DN dataset.
  
 {noformat}
 2014-10-04 02:11:12,626 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing 
 replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume 
 /data/disk6/dfs/current
 {noformat}
 2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN 
 remove the DN and the replicas from the BlocksMap. In addition, blockReport 
 doesn't provide the diff given that is done per storage.
 {noformat}
 2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
 Disk error on DatanodeRegistration(xx.xx.xx.xxx, 
 datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075, 
 ipcPort=50020, 
 storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939):
  DataNode failed volumes:/data/disk6/dfs/current
 {noformat}
 3. Run fsck on the file and confirm the NN's BlocksMap still has that replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5089) When a LayoutVersion support SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION.

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173762#comment-14173762
 ] 

Hudson commented on HDFS-5089:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1903 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1903/])
HDFS-5089. When a LayoutVersion support SNAPSHOT, it must support 
FSIMAGE_NAME_OPTIMIZATION. (szetszwo: rev 
289442a242259af53dc73a156aa523e3e6c7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestLayoutVersion.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 When a LayoutVersion support SNAPSHOT, it must support 
 FSIMAGE_NAME_OPTIMIZATION.
 -

 Key: HDFS-5089
 URL: https://issues.apache.org/jira/browse/HDFS-5089
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: 2.6.0

 Attachments: h5089_20130813.patch, h5089_20140325.patch


 The SNAPSHOT layout requires FSIMAGE_NAME_OPTIMIZATION as a prerequisite.  
 However, RESERVED_REL1_3_0 supports SNAPSHOT but not 
 FSIMAGE_NAME_OPTIMIZATION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5089) When a LayoutVersion support SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION.

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173772#comment-14173772
 ] 

Hudson commented on HDFS-5089:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1928 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1928/])
HDFS-5089. When a LayoutVersion support SNAPSHOT, it must support 
FSIMAGE_NAME_OPTIMIZATION. (szetszwo: rev 
289442a242259af53dc73a156aa523e3e6c7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestLayoutVersion.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 When a LayoutVersion support SNAPSHOT, it must support 
 FSIMAGE_NAME_OPTIMIZATION.
 -

 Key: HDFS-5089
 URL: https://issues.apache.org/jira/browse/HDFS-5089
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: 2.6.0

 Attachments: h5089_20130813.patch, h5089_20140325.patch


 The SNAPSHOT layout requires FSIMAGE_NAME_OPTIMIZATION as a prerequisite.  
 However, RESERVED_REL1_3_0 supports SNAPSHOT but not 
 FSIMAGE_NAME_OPTIMIZATION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7208) NN doesn't schedule replication when a DN storage fails

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173769#comment-14173769
 ] 

Hudson commented on HDFS-7208:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1928 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1928/])
HDFS-7208. NN doesn't schedule replication when a DN storage fails.  
Contributed by Ming Ma (szetszwo: rev 41980c56d3c01d7a0ddc7deea2d89b7f28026722)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java


 NN doesn't schedule replication when a DN storage fails
 ---

 Key: HDFS-7208
 URL: https://issues.apache.org/jira/browse/HDFS-7208
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-7208-2.patch, HDFS-7208-3.patch, HDFS-7208.patch


 We found the following problem. When a storage device on a DN fails, NN 
 continues to believe replicas of those blocks on that storage are valid and 
 doesn't schedule replication.
 A DN has 12 storage disks. So there is one blockReport for each storage. When 
 a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given 
 dfs.datanode.failed.volumes.tolerated is configured to be  0, NN still 
 considers that DN healthy.
 1. A disk failed. All blocks of that disk are removed from DN dataset.
  
 {noformat}
 2014-10-04 02:11:12,626 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing 
 replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume 
 /data/disk6/dfs/current
 {noformat}
 2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN 
 remove the DN and the replicas from the BlocksMap. In addition, blockReport 
 doesn't provide the diff given that is done per storage.
 {noformat}
 2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
 Disk error on DatanodeRegistration(xx.xx.xx.xxx, 
 datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075, 
 ipcPort=50020, 
 storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939):
  DataNode failed volumes:/data/disk6/dfs/current
 {noformat}
 3. Run fsck on the file and confirm the NN's BlocksMap still has that replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7185) The active NameNode will not accept an fsimage sent from the standby during rolling upgrade

2014-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173771#comment-14173771
 ] 

Hudson commented on HDFS-7185:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1928 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1928/])
HDFS-7185. The active NameNode will not accept an fsimage sent from the standby 
during rolling upgrade. Contributed by Jing Zhao. (jing9: rev 
18620649f96d9e378fb7ea40de216284a9d525c7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The active NameNode will not accept an fsimage sent from the standby during 
 rolling upgrade
 ---

 Key: HDFS-7185
 URL: https://issues.apache.org/jira/browse/HDFS-7185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7185.000.patch, HDFS-7185.001.patch, 
 HDFS-7185.002.patch, HDFS-7185.003.patch, HDFS-7185.004.patch


 The active NameNode will not accept an fsimage sent from the standby during 
 rolling upgrade.  The active fails with the exception:
 {code}
 18:25:07,620  WARN ImageServlet:198 - Received an invalid request file 
 transfer request from a secondary with storage info 
 -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
 18:25:07,620  WARN log:76 - Committed before 410 PutImage failed. 
 java.io.IOException: This namenode has storage info 
 -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
 expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
 0a6e431987f6
 at 
 org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
 at 
 org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
 {code}
 On the standby, the exception is:
 {code}
 java.io.IOException: Exception during image upload: 
 org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
  This namenode has storage info 
 -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
 expected
  -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
 at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
 at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
 {code}
 This seems to be a consequence of the fact that the VERSION file still is at 
 -55 (the old version) even after the rolling upgrade has started.  When the 
 rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}}, 
 both VERSION files get set to the new version, and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7243) HDFS concat operation should not be allowed in Encryption Zone

2014-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173862#comment-14173862
 ] 

Hadoop QA commented on HDFS-7243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675272/HDFS-7243.003.patch
  against trunk revision 2894433.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8440//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8440//console

This message is automatically generated.

 HDFS concat operation should not be allowed in Encryption Zone
 --

 Key: HDFS-7243
 URL: https://issues.apache.org/jira/browse/HDFS-7243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Charles Lamb
 Attachments: HDFS-7243.001.patch, HDFS-7243.002.patch, 
 HDFS-7243.003.patch


 For HDFS encryption at rest, files in an encryption zone are using different 
 data encryption keys, so concat should be disallowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-16 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7226:

Attachment: HDFS-7226.001.patch

 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7255) Customize Java Heap min/max settings for individual processes

2014-10-16 Thread Mark Tse (JIRA)
Mark Tse created HDFS-7255:
--

 Summary: Customize Java Heap min/max settings for individual 
processes
 Key: HDFS-7255
 URL: https://issues.apache.org/jira/browse/HDFS-7255
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, journal-node, namenode
Affects Versions: 2.5.1, 2.4.1
Reporter: Mark Tse


The NameNode and JournalNode (and ZKFC) can all run on the same machine. 
However, they get their heap settings from HADOOP_HEAPSIZE/JAVA_HEAP_MAX. There 
are scenarios where the NameNode process should have different Java memory 
requirements than the JournalNode and ZKFC (e.g. if the machine has 10 GB of 
RAM, and I want the NameNode process to have 8 GB max). 

HADOOP_(.*)_OPTS variables exist for these processes and can be used to add the 
Xms and Xmx tags, but because of how the default for JAVA_HEAP_MAX is set, it 
will always add '-Xmx1000m' to the final call to start up the 
NameNode/JournalNode/ZKFC process, resulting in two different Java heap 
settings (e.g. -Xmx1000m and -Xmx8g is used when starting the NameNode).

Note: HADOOP_HEAPSIZE is deprecated according to [HADOOP-10950]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-16 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7226:

Status: Patch Available  (was: Open)

Submit patch 001.

HI [~jingzhao] and [~kihwal],

I found that the failure reported in this jira is also introduced by the 
HDFS-7217 fix. But the issue took me some time to understand. 

Basically, because of HDFS-7217 change, reporting of ReceivingBlock to NN is 
delayed, in the reported testcase, they are replaced by ReceivedBlock later 
(see comment in the comment I put in the patch). 

Thanks Jing for the help on HDFS-7236. Would any of you please help taking a 
look at the patch?

Thanks a lot.
 





 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5928) show namespace and namenode ID on NN dfshealth page

2014-10-16 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-5928:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-6751

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6744) Improve decommissioning nodes and dead nodes access on the new NN webUI

2014-10-16 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174072#comment-14174072
 ] 

Siqi Li commented on HDFS-6744:
---

Hi [~wheat9], can you take a look at this patch?

 Improve decommissioning nodes and dead nodes access on the new NN webUI
 ---

 Key: HDFS-6744
 URL: https://issues.apache.org/jira/browse/HDFS-6744
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Siqi Li
 Attachments: HDFS-6744.v1.patch


 The new NN webUI lists live node at the top of the page, followed by dead 
 node and decommissioning node. From admins point of view:
 1. Decommissioning nodes and dead nodes are more interesting. It is better to 
 move decommissioning nodes to the top of the page, followed by dead nodes and 
 decommissioning nodes.
 2. To find decommissioning nodes or dead nodes, the whole page that includes 
 all nodes needs to be loaded. That could take some time for big clusters.
 The legacy web UI filters out the type of nodes dynamically. That seems to 
 work well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently

2014-10-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174155#comment-14174155
 ] 

Yongjun Zhang commented on HDFS-7221:
-

HI [~clamb],

I think I found the root cause here. With HDFS-7128 fix, the 
dfs.namenode.replication.max-streams-hard-limit property is better enforced. 
And this caused the testFencingStress() test failure reported here, because the 
test is a stress one.

I added one line of change to see the test consistently passing:
{code}
 harness.conf.setInt(
DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY, 16);
{code}

Thanks [~mingma] for fixing HDFS-7128, and [~kihwal], [~cnauroth] for the 
discussion there. 

I was thinking about whether the soft and hard setting of this property is 
ideal, and I noticed that you guys had some discussion there. It sounds that 
this property can be even set per node basis based on the hardware a node is 
equipped with. But this may complicate the software. I guess for now we just 
need to kind in mind that we have this property enforced.

Thanks Charles again for reporting this long outstanding failure of recent 
jenkins jobs.


 TestDNFencingWithReplication fails consistently
 ---

 Key: HDFS-7221
 URL: https://issues.apache.org/jira/browse/HDFS-7221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch


 TestDNFencingWithReplication consistently fails with a timeout, both in 
 jenkins runs and on my local machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174280#comment-14174280
 ] 

Hadoop QA commented on HDFS-7226:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675317/HDFS-7226.001.patch
  against trunk revision 2894433.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8441//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8441//console

This message is automatically generated.

 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174306#comment-14174306
 ] 

Yongjun Zhang commented on HDFS-7226:
-

The remaining failed test TestDNFencingWithReplication.testFencingStress was 
reported as HDFS-7221.


 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-16 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174317#comment-14174317
 ] 

Jing Zhao commented on HDFS-7226:
-

Thanks for working on this, Yongjun! So with the current fix, is it possible 
that the DN just happens to send out an IBR (after normal waiting) right after 
receiving the data? In that case, DN may still send out both the block 
receiving and received msg. Thus maybe we can still call 
{{triggerBlockReportForTests}} here in the tests to make sure a block receiving 
report is sent out.

 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174367#comment-14174367
 ] 

Yongjun Zhang commented on HDFS-7226:
-

HI [~jingzhao],

Thanks for the review and comments. 

I actually had tried that before I came up with this new solution. The issue 
with calling {{triggerBlockReportForTests}} is, we would see 6 reports instead 
of 3 the test expects, even though we only have 3 BlockReceiving entries. I 
think the reason is about how {{triggerBlockReportForTests}} is implemented: it 
incurs a waiting loop for the 3 second heartbeat interval, at which time, it 
will do additional block reports than the original 3, and ends up 6 reports 
instead of 3.  But let me take a further look at this direction.






 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6581) Write to single replica in memory

2014-10-16 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174407#comment-14174407
 ] 

Jitendra Nath Pandey commented on HDFS-6581:


I am planning to merge this to branch-2 today, and subsequently to branch-2.6 
by tomorrow. As agreed on HDFS-6919, In 2.6 we will indicate in the release 
notes that the memory for writes on RAM and the memory for caching in datanodes 
are independent, and a feature to manage them together will be added in the 
next release.

 Write to single replica in memory
 -

 Key: HDFS-6581
 URL: https://issues.apache.org/jira/browse/HDFS-6581
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
 HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
 HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
 HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, 
 HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, 
 HDFSWriteableReplicasInMemory.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf


 Per discussion with the community on HDFS-5851, we will implement writing to 
 a single replica in DN memory via DataTransferProtocol.
 This avoids some of the issues with short-circuit writes, which we can 
 revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking

2014-10-16 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174412#comment-14174412
 ] 

Jitendra Nath Pandey commented on HDFS-6919:


I am planning to merge HDFS-6581 work to branch-2 today, and subsequently to 
branch-2.6 by tomorrow. As suggested earlier, in 2.6 we will indicate in the 
release notes that the memory for writes on RAM and the memory for caching in 
datanodes are independent, and a feature to manage them together will be added 
in the next release.

 Enforce a single limit for RAM disk usage and replicas cached via locking
 -

 Key: HDFS-6919
 URL: https://issues.apache.org/jira/browse/HDFS-6919
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Arpit Agarwal
Assignee: Colin Patrick McCabe
Priority: Blocker

 The DataNode can have a single limit for memory usage which applies to both 
 replicas cached via CCM and replicas on RAM disk.
 See comments 
 [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
  
 [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
  and 
 [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
  for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7240) Object store in HDFS

2014-10-16 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174416#comment-14174416
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


I think HDFS-5477 takes us towards making the block management service generic 
enough to support different storage semantics and API. In that sense object 
store will be one more use case for the block management. The object store 
design should work with the block management service.

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey

 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-16 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174429#comment-14174429
 ] 

Brandon Li commented on HDFS-7180:
--

Sorry for the late response. Thanks for filing the bug, [~ericzma].
NFS gateway could be stuck in GC and thus the connection with DN timed out, 
which makes NFS gateway think the DN is bad. If this is the case, you can find 
lots of socket timeout exception in DN logs.

One of the cause of GC is the reordered writes arrive faster than the speed to 
dump them on local disk. In this case, NFS log should have 
nonSequentialWriteInMemory with a very big value (need the trace level to be 
DEBUG).

I will upload a patch soon.

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical

 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
 cache now
 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
 doing static UID/GID mapping because '/etc/nfs.map' does not exist.
 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
 ReadProcessor read fields took 60062ms (threshold=3ms); 

[jira] [Created] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation

2014-10-16 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-7256:


 Summary: Encryption Key created in Java Key Store after Namenode 
start unavailable for EZ Creation 
 Key: HDFS-7256
 URL: https://issues.apache.org/jira/browse/HDFS-7256
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, security
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao


Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ 
with a Key created after NN starts.

Briefly check the code and found that the KeyProivder is loaded by FSN only at 
the NN start. My work around is to restart the NN which triggers the reload of 
Key Provider. Is this expected?

Repro Steps:

Create a new Key after NN and KMS starts
hadoop/bin/hadoop key create ezkey1 -size 256 -provider 
jceks://file/home/hadoop/kms.keystore

List Keys
hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider 
jceks://file/home/hadoop/kms.keystore -metadata
Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore
ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: 
Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null
key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue 
Oct 14 19:44:09 EDT 2014, version: 1, attributes: null
key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue 
Oct 14 17:52:36 EDT 2014, version: 1, attributes: null

Create Encryption Zone
hadoop/bin/hdfs dfs -mkdir /Ez1
hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 
-path /Ez1
RemoteException: Key ezkey1 doesn't exist.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7184) Allow data migration tool to run as a daemon

2014-10-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174573#comment-14174573
 ] 

Allen Wittenauer commented on HDFS-7184:


+1 

 Allow data migration tool to run as a daemon
 

 Key: HDFS-7184
 URL: https://issues.apache.org/jira/browse/HDFS-7184
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover, scripts
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Attachments: HDFS-7184.patch, HDFS-7184.patch


 Just like balancer, it is sometimes required to run data migration tool in a 
 daemon mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7184) Allow data migration tool to run as a daemon

2014-10-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174573#comment-14174573
 ] 

Allen Wittenauer edited comment on HDFS-7184 at 10/17/14 1:10 AM:
--

+1 

Since I'm out of town, I'll let someone else commit it. If it isn't committed 
when I get back next week, I'll take care of it. :)


was (Author: aw):
+1 

 Allow data migration tool to run as a daemon
 

 Key: HDFS-7184
 URL: https://issues.apache.org/jira/browse/HDFS-7184
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover, scripts
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Attachments: HDFS-7184.patch, HDFS-7184.patch


 Just like balancer, it is sometimes required to run data migration tool in a 
 daemon mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon

2014-10-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174578#comment-14174578
 ] 

Allen Wittenauer commented on HDFS-7204:


bq.  Maybe we change the variable name daemon to something like run_via_dh 
(run via daemon handler) and add a comment like Allen summarized? Thanks.

Sure. Open a jira under hadoop common to rename daemon and I'll work something 
up.

BTW, it's probably worth pointing out that if you look at hadoop-config.sh, 
you'll see where --daemon is specifically handled.

 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
  Labels: newbie
 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation

2014-10-16 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174669#comment-14174669
 ] 

Yi Liu commented on HDFS-7256:
--

Thanks [~xyao] for testing this, this should be not an issue. Let me explain 
below.

HDFS encryption at rest requires user to configure a KMS, and the backing 
KeyProvider of KMS can be a {{JavaKeyStoreProvider}} or a third-party keystore 
which implements Hadoop {{KeyProvider}} interface.
In your case, {{JavaKeyStoreProvider}} is used directly, actually both FSN and 
DFSClient will have KeyProvider instance (different), FSN uses KeyProvider 
instance to get EncryptionZone key and get Encrypted data encryption keys, and 
DFSClient uses KeyProvider instance to decrypt the data encryption keys.  
JavaKeyStoreProvider uses local java keystore file, it can't satisfy multiple 
nodes accessing. 
hadoop key create ... command constructs its KeyProvider instance in client 
side, and create/flush key to java keystore file, and FSN will not reload the 
java keystore file. That's the reason why you see the exception.

So please configure a KMS and the backing KeyProvider could be a 
{{JavaKeyStoreProvider}}, for more information, please refer to the 
fs-encryption/KMS user doc.

 Encryption Key created in Java Key Store after Namenode start unavailable for 
 EZ Creation 
 --

 Key: HDFS-7256
 URL: https://issues.apache.org/jira/browse/HDFS-7256
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, security
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao

 Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ 
 with a Key created after NN starts.
 Briefly check the code and found that the KeyProivder is loaded by FSN only 
 at the NN start. My work around is to restart the NN which triggers the 
 reload of Key Provider. Is this expected?
 Repro Steps:
 Create a new Key after NN and KMS starts
 hadoop/bin/hadoop key create ezkey1 -size 256 -provider 
 jceks://file/home/hadoop/kms.keystore
 List Keys
 hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider 
 jceks://file/home/hadoop/kms.keystore -metadata
 Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore
 ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: 
 Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null
 key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: 
 Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null
 key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: 
 Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null
 Create Encryption Zone
 hadoop/bin/hdfs dfs -mkdir /Ez1
 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 
 -path /Ez1
 RemoteException: Key ezkey1 doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation

2014-10-16 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu resolved HDFS-7256.
--
Resolution: Not a Problem

I mark it as Not a Problem, please feel free to reopen it if you have 
different  opinions.

 Encryption Key created in Java Key Store after Namenode start unavailable for 
 EZ Creation 
 --

 Key: HDFS-7256
 URL: https://issues.apache.org/jira/browse/HDFS-7256
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, security
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao

 Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ 
 with a Key created after NN starts.
 Briefly check the code and found that the KeyProivder is loaded by FSN only 
 at the NN start. My work around is to restart the NN which triggers the 
 reload of Key Provider. Is this expected?
 Repro Steps:
 Create a new Key after NN and KMS starts
 hadoop/bin/hadoop key create ezkey1 -size 256 -provider 
 jceks://file/home/hadoop/kms.keystore
 List Keys
 hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider 
 jceks://file/home/hadoop/kms.keystore -metadata
 Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore
 ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: 
 Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null
 key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: 
 Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null
 key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: 
 Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null
 Create Encryption Zone
 hadoop/bin/hdfs dfs -mkdir /Ez1
 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 
 -path /Ez1
 RemoteException: Key ezkey1 doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently

2014-10-16 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174703#comment-14174703
 ] 

Ming Ma commented on HDFS-7221:
---

Thanks Yongjun and Charles for investigating this.

I agree with the suggestion to increase the value for 
dfs.namenode.replication.max-streams-hard-limit. Please note that 
dfs.namenode.replication.max-streams normally is set to less than or equal to 
dfs.namenode.replication.max-streams-hard-limit as it doesn't matter otherwise. 
So as part of this fix, you can change the value for 
dfs.namenode.replication.max-streams to be 16.

IMHO, per node configuration is useful if you have heterogeneous nodes in the 
cluster and the scope is much more than these two properties, for example you 
have other configurations such as maxXceiverCount, balancer bandwidth, etc.. 
Heterogeneous storages might have addressed some of the issues. Besides, it 
should be easy to manage; maybe some sort of labels support in HDFS.

 TestDNFencingWithReplication fails consistently
 ---

 Key: HDFS-7221
 URL: https://issues.apache.org/jira/browse/HDFS-7221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch


 TestDNFencingWithReplication consistently fails with a timeout, both in 
 jenkins runs and on my local machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-16 Thread Eric Zhiqiang Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174706#comment-14174706
 ] 

Eric Zhiqiang Ma commented on HDFS-7180:


~brandonli: Not at all and many thanks a lot for the analysis and confirmation!

I checked the log on 10.0.3.176 and found the exception of socket timeout 
between 10.0.3.172 and 10.0.3.176 as follows.

--
2014-10-02 06:00:07,326 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 src: 
/10.0.3.172:37334 dest: /10.0.3.176:
50010
2014-10-02 06:00:31,970 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Slow flushOrSync took 24097ms (threshold=300ms), isSync:true, 
flushTotalNanos=9424ns
2014-10-02 06:01:32,093 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Exception for BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
java.net.SocketTimeoutException: 6 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.0.3.176:50010 remote=/10.0.3.17
2:37334]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:453)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:734)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:741)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
at java.lang.Thread.run(Thread.java:745)
2014-10-02 06:01:32,093 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643, 
type=LAST_IN_PIPELINE, downstream
s=0:[]: Thread is interrupted.
2014-10-02 06:01:32,093 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643, 
type=LAST_IN_PIPELINE, downstream
s=0:[] terminating
2014-10-02 06:01:32,093 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
opWriteBlock BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 
received exception java.net.SocketTime
outException: 6 millis timeout while waiting for channel to be ready for 
read. ch : java.nio.channels.SocketChannel[connected local=/10.0.3.176:50010 
remote=/10.0.3.172:37334]
2014-10-02 06:01:32,093 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
dstore-176:50010:DataXceiver error processing WRITE_BLOCK operation  src: 
/10.0.3.172:37334 dst: /10.0.3.176:50
010
java.net.SocketTimeoutException: 6 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.0.3.176:50010 remote=/10.0.3.17
2:37334]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at 

[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation

2014-10-16 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174744#comment-14174744
 ] 

Xiaoyu Yao commented on HDFS-7256:
--

Thanks [~hitliuyi] for the detail explanation.  I configured my test 
environment based on HDFS-6134 proposal: 
https://issues.apache.org/jira/secure/attachment/12660368/HDFSDataatRestEncryption.pdf.
 
Can you point me the link to fs-encryption/KMS user doc if there is a different 
one?

I do have a KMS setup with JavaKeyStoreProvider pointing to the same java key 
store file. 
Based on your suggestion, I just switch to use 'kms://http@localhost:16000/kms' 
instead of the java key store file 
'jceks://file/home/hadoop/kms.keystore' directly for the 
'dfs.encryption.key.provider.uri' in hdfs-site.xml and 
'hadoop.security.crypto.jce.provider' in core-site.xml.

Below I have two follow up questions when executing the the 'hadoop key' 
command after the change. Can you confirm if these are expected or not?

1. Have to specify -provider explicitly even though 
hadoop.security.crypto.jce.provider='kms://http@localhost:16000/kms' is 
configured in core-site.xml.

hadoop@hadoopdev:~/deploy$ hadoop/bin/hadoop key list
There are no non-transient KeyProviders configured.
Use the -provider option to specify a provider. If you
want to list a transient provider then you must use the
-provider argument.

2. Keys are returned with -provider specified but WARN message is logged in 
kms.log on Anonymous request. My understanding is that KMS should proxy user 
'hadoop' based the proxy user setting below. Do I miss anything?
 
hadoop@hadoopdev:~/deploy$ hadoop/bin/hadoop key list -provider 
kms://http@localhost:16000/kms
Listing keys for KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/]
key1

{code}
2014-10-16 22:08:38,386 WARN  AuthenticationFilter - Authentication exception: 
Anonymous requests are disallowed
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Anonymous requests are disallowed
at 
org.apache.hadoop.security.authentication.server.PseudoAuthenticationHandler.authenticate(PseudoAuthenticationHandler.java:184)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:330)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507)
at 
org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:129)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)
{/code}

The client runs with user 'hadoop'. The proxyuser and delegation token(use 
default) are set up in kms-site.xml. 
  !-- proxyuser configuration for user named:  hadoop--
  property
namehadoop.kms.proxyuser.hadoop.users/name
value*/value
  /property 
...

 Encryption Key created in Java Key Store after Namenode start unavailable for 
 EZ Creation 
 --

 Key: HDFS-7256
 URL: https://issues.apache.org/jira/browse/HDFS-7256
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, security
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao

 Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ 
 with a Key created after NN starts.
 Briefly check the code and found that the KeyProivder is loaded by FSN only 
 at the NN start. My work around is to restart the NN which triggers the 
 reload of Key Provider. Is this expected?
 Repro Steps:
 Create a new Key after NN and KMS starts
 hadoop/bin/hadoop key create ezkey1 -size 256 -provider 
 jceks://file/home/hadoop/kms.keystore
 List Keys
 hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider