date:20140324


[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944884#comment-13944884
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636306/HDFS-6143.v02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6471//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6471//console

This message is automatically generated.

 HftpFileSystem open should throw FileNotFoundException for non-existing paths
 -

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch


 HftpFileSystem.open incorrectly handles non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6147) New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..)


[ 
https://issues.apache.org/jira/browse/HDFS-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944949#comment-13944949
 ] 

Hadoop QA commented on HDFS-6147:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636312/HDFS-6147.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
  
org.apache.hadoop.hdfs.server.datanode.TestMultipleNNDataBlockScanner
  
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage
  
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport
  org.apache.hadoop.hdfs.TestDatanodeBlockScanner

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6472//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6472//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6472//console

This message is automatically generated.

 New blocks scanning will be delayed due to issue in 
 BlockPoolSliceScanner#updateBytesToScan(..)
 ---

 Key: HDFS-6147
 URL: https://issues.apache.org/jira/browse/HDFS-6147
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-6147.patch


 New blocks scanning will be delayed if old blocks deleted after datanode 
 restart.
 Steps:
 1. Write some blocks and wait till all scans over
 2. Restart the datanode
 3. Delete some of the blocks
 4. Write new blocks which are less in size compared to deleted blocks.
 Problem:
 {{BlockPoolSliceScanner#updateBytesToScan(..)}} updates {{bytesLeft}} based 
 on following comparison
 {code}   if (lastScanTime  currentPeriodStart) {
   bytesLeft += len;
 }{code}
 But in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} 
 {{bytesLeft}} decremented using below comparison
 {code}if (now - entry.verificationTime  scanPeriod) {{code}
 Hence when the old blocks are deleted {{bytesLeft}} going negative.
 new blocks will not be scanned until it becomes positive again.
 So in both places verificationtime should be compared against scanperiod.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-4929) [NNBench mark] Lease mismatch error when running with multiple mappers

2014-03-24 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned HDFS-4929:
--

Assignee: Brahma Reddy Battula

 [NNBench mark] Lease mismatch error when running with multiple mappers
 --

 Key: HDFS-4929
 URL: https://issues.apache.org/jira/browse/HDFS-4929
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: benchmarks
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula

 Command :
 ./yarn jar 
 ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar 
 nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 
 -bytesToWrite 102400 -baseDir /benchmarks/NNBench`hostname -s` 
 -replicationFactorPerFile 3 -maps 100 -reduces 10
 Trace :
 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.105.214:36320: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by 
 DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed 
 by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by 
 DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed 
 by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5196) Provide more snapshot information in WebUI


[ 
https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944970#comment-13944970
 ] 

Hadoop QA commented on HDFS-5196:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636316/HDFS-5196-8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6473//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6473//console

This message is automatically generated.

 Provide more snapshot information in WebUI
 --

 Key: HDFS-5196
 URL: https://issues.apache.org/jira/browse/HDFS-5196
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 3.0.0
Reporter: Haohui Mai
Assignee: Shinichi Yamashita
Priority: Minor
 Attachments: HDFS-5196-2.patch, HDFS-5196-3.patch, HDFS-5196-4.patch, 
 HDFS-5196-5.patch, HDFS-5196-6.patch, HDFS-5196-7.patch, HDFS-5196-8.patch, 
 HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, 
 snapshottable-directoryList.png, snapshotteddir.png


 The WebUI should provide more detailed information about snapshots, such as 
 all snapshottable directories and corresponding number of snapshots 
 (suggested in HDFS-4096).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers


[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945142#comment-13945142
 ] 

Yu Li commented on HDFS-6010:
-

The UT failure is caused by a bug of TestBalancer, here is detailed analysis:

Let's look into the code logic of testUnevenDistribution: If number of datanode 
of the mini-cluster is 3(or larger), the replication factor will be set to 2(or 
more), and generateBlocks will generate a file with it, say the block number 
will equal to (targetSize/replicationFactor)/blockSize. Then distributeBlock 
will double the block number through below codes:
{code}
for(int i=0; iblocks.length; i++) {
  for(int j=0; jreplicationFactor; j++) {
boolean notChosen = true;
while(notChosen) {
  int chosenIndex = r.nextInt(usedSpace.length);
  if( usedSpace[chosenIndex]0 ) {
notChosen = false;
blockReports.get(chosenIndex).add(blocks[i].getLocalBlock());
usedSpace[chosenIndex] -= blocks[i].getNumBytes();
  }
}
  }
}
{code}
Notice that this distribution cannot prevent replicated blocks on the same 
datanode. And then, while invoking the MiniDFSCluster#injectBlocks(actually 
SimulatedFSDataset#injectBlocks) method, the duplicated blocks would get 
removed according to below code segment
{code:title=SimulatedFSDataset#injectBlocks}
  public synchronized void injectBlocks(String bpid,
  IterableBlock injectBlocks) throws IOException {
ExtendedBlock blk = new ExtendedBlock();
if (injectBlocks != null) {
  for (Block b: injectBlocks) { // if any blocks in list is bad, reject list
if (b == null) {
  throw new NullPointerException(Null blocks in block list);
}
blk.set(bpid, b);
if (isValidBlock(blk)) {
  throw new IOException(Block already exists in  block list);
}
  }
  MapBlock, BInfo map = blockMap.get(bpid);
  if (map == null) {
map = new HashMapBlock, BInfo();
blockMap.put(bpid, map);
  }
  for (Block b: injectBlocks) {
BInfo binfo = new BInfo(bpid, b, false);
map.put(binfo.theBlock, binfo);
  }
}
  }
{code}
This will cause the used space less than what is expected thus cause testing 
failure. The issue was hidden because *in existing tests the datanode number 
was never set to larger than 2*. It would be easy to reproduce the issue simply 
by increasing the datanode number of TestBalancer#testBalancer1Internal from 2 
to 3, like
{code:title=TestBalancer#testBalancer1Internal}
  void testBalancer1Internal(Configuration conf) throws Exception {
initConf(conf);
testUnevenDistribution(conf,
new long[] {90*CAPACITY/100, 50*CAPACITY/100, 10*CAPACITY/100},
new long[] {CAPACITY, CAPACITY, CAPACITY},
new String[] {RACK0, RACK1, RACK2});
  }
{code}

I've tried to refine the distribution method, however I found it hard to make 
it general. To make sure no duplicated blocks assigned to the same datanode, we 
must make sure the largest distribution less than sum of the other distributions

After a second thought, I even don't think it necessary to involve replication 
factor into the balancer testing. Maybe the UT designer was thinking about 
testing balancer manner when there's also replication ongoing, but 
unfortunately the current design cannot reveal this. So personally, I propose 
to always set replication factor to 1 in TestBalancer

 Make balancer able to balance data among specified servers
 --

 Key: HDFS-6010
 URL: https://issues.apache.org/jira/browse/HDFS-6010
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
  Labels: balancer
 Attachments: HDFS-6010-trunk.patch


 Currently, the balancer tool balances data among all datanodes. However, in 
 some particular case, we would need to balance data only among specified 
 nodes instead of the whole set.
 In this JIRA, a new -servers option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers


 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Attachment: HDFS-6010-trunk_V2.patch

Attach the new patch with fix of the UT failure as mentioned above, and 
resubmit patch for hadoop QA to test

 Make balancer able to balance data among specified servers
 --

 Key: HDFS-6010
 URL: https://issues.apache.org/jira/browse/HDFS-6010
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
  Labels: balancer
 Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch


 Currently, the balancer tool balances data among all datanodes. However, in 
 some particular case, we would need to balance data only among specified 
 nodes instead of the whole set.
 In this JIRA, a new -servers option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers


 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Status: Patch Available  (was: Open)

 Make balancer able to balance data among specified servers
 --

 Key: HDFS-6010
 URL: https://issues.apache.org/jira/browse/HDFS-6010
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
  Labels: balancer
 Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch


 Currently, the balancer tool balances data among all datanodes. However, in 
 some particular case, we would need to balance data only among specified 
 nodes instead of the whole set.
 In this JIRA, a new -servers option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers


 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Status: Open  (was: Patch Available)

 Make balancer able to balance data among specified servers
 --

 Key: HDFS-6010
 URL: https://issues.apache.org/jira/browse/HDFS-6010
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
  Labels: balancer
 Attachments: HDFS-6010-trunk.patch


 Currently, the balancer tool balances data among all datanodes. However, in 
 some particular case, we would need to balance data only among specified 
 nodes instead of the whole set.
 In this JIRA, a new -servers option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated


[ 
https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945192#comment-13945192
 ] 

Kihwal Lee commented on HDFS-3087:
--

+1 The patch looks good. 

 Decomissioning on NN restart can complete without blocks being replicated
 -

 Key: HDFS-3087
 URL: https://issues.apache.org/jira/browse/HDFS-3087
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.0
Reporter: Kihwal Lee
Priority: Critical
 Attachments: HDFS-3087.patch


 If a data node is added to the exclude list and the name node is restarted, 
 the decomissioning happens right away on the data node registration. At this 
 point the initial block report has not been sent, so the name node thinks the 
 node has zero blocks and the decomissioning completes very quick, without 
 replicating the blocks on that node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency

2014-03-24 Thread Nikola Vujic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Vujic updated HDFS-5846:
---

Attachment: hdfs-5846.patch

 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency

2014-03-24 Thread Nikola Vujic (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945207#comment-13945207
 ] 

Nikola Vujic commented on HDFS-5846:


Hi Chris,

I have fixed the patch according to your comments. Now I have two lines longer 
than 80 characters. That is because of long names for constants. Is this ok?

I have implemented a unit test (testRejectUnresolvedDatanodes) in the 
TestDatanodeManager class since it seems as a more appropriate place for 
testing this particular thing. Also, in the same class I have changed the 
logger (org.mortbay.log.Log was used and now patch is using 
org.apache.commons.logging.Log).


 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated


 [ 
https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3087:
-

Assignee: Rushabh S Shah

 Decomissioning on NN restart can complete without blocks being replicated
 -

 Key: HDFS-3087
 URL: https://issues.apache.org/jira/browse/HDFS-3087
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.0
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-3087.patch


 If a data node is added to the exclude list and the name node is restarted, 
 the decomissioning happens right away on the data node registration. At this 
 point the initial block report has not been sent, so the name node thinks the 
 node has zero blocks and the decomissioning completes very quick, without 
 replicating the blocks on that node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated


 [ 
https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3087:
-

   Resolution: Fixed
Fix Version/s: 2.5.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for working on the fix, 
Rushabh.

 Decomissioning on NN restart can complete without blocks being replicated
 -

 Key: HDFS-3087
 URL: https://issues.apache.org/jira/browse/HDFS-3087
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.0
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-3087.patch


 If a data node is added to the exclude list and the name node is restarted, 
 the decomissioning happens right away on the data node registration. At this 
 point the initial block report has not been sent, so the name node thinks the 
 node has zero blocks and the decomissioning completes very quick, without 
 replicating the blocks on that node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated


[ 
https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945239#comment-13945239
 ] 

Hudson commented on HDFS-3087:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5386 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5386/])
HDFS-3087. Decomissioning on NN restart can complete without blocks being 
replicated. Contributed by Rushabh S Shah. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580886)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java


 Decomissioning on NN restart can complete without blocks being replicated
 -

 Key: HDFS-3087
 URL: https://issues.apache.org/jira/browse/HDFS-3087
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.0
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-3087.patch


 If a data node is added to the exclude list and the name node is restarted, 
 the decomissioning happens right away on the data node registration. At this 
 point the initial block report has not been sent, so the name node thinks the 
 node has zero blocks and the decomissioning completes very quick, without 
 replicating the blocks on that node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers


[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945342#comment-13945342
 ] 

Hadoop QA commented on HDFS-6010:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12636334/HDFS-6010-trunk_V2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6474//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6474//console

This message is automatically generated.

 Make balancer able to balance data among specified servers
 --

 Key: HDFS-6010
 URL: https://issues.apache.org/jira/browse/HDFS-6010
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
  Labels: balancer
 Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch


 Currently, the balancer tool balances data among all datanodes. However, in 
 some particular case, we would need to balance data only among specified 
 nodes instead of the whole set.
 In this JIRA, a new -servers option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6148) LeaseManager crashes while initiating block recovery

Kihwal Lee created HDFS-6148:


 Summary: LeaseManager crashes while initiating block recovery
 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker


While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
always happen on block recovery.

{panel}
Exception in thread
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
at java.lang.Thread.run(Thread.java:722)
{panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6148) LeaseManager crashes while initiating block recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6148:
-

Description: 
While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
always happen on block recovery.

{panel}
Exception in thread
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
at java.lang.Thread.run(Thread.java:722)
{panel}

  was:
While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
always happen on block recovery.

{panel}
Exception in thread
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
at java.lang.Thread.run(Thread.java:722)
{panel}


 LeaseManager crashes while initiating block recovery
 

 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker

 While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
 always happen on block recovery.
 {panel}
 Exception in thread
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
 at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6148) LeaseManager crashes while initiating block recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6148:
-

Description: 
While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
always happen on block recovery.

{panel}
Exception in thread
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$

ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
at java.lang.Thread.run(Thread.java:722)
{panel}

  was:
While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
always happen on block recovery.

{panel}
Exception in thread
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
at java.lang.Thread.run(Thread.java:722)
{panel}


 LeaseManager crashes while initiating block recovery
 

 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker

 While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
 always happen on block recovery.
 {panel}
 Exception in thread
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$
 
 ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
 initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
 at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5196) Provide more snapshot information in WebUI


[ 
https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945405#comment-13945405
 ] 

Haohui Mai commented on HDFS-5196:
--

The patch generally looks good. There are some minor comments needed to be 
addressed:

{code}
+'array_length' : function (v) {
+  var cnt = 0;
+  for (var i in v) {
+cnt++;
+  }
+  return cnt;
{code}

You can follow what dust.js does 
(https://github.com/linkedin/dustjs/wiki/Dust-Tutorial#size_keyxxx___size_helper_Available_in_Dust_V11_release),
 and derives a new version that evaluates the key.

Otherwise, I think that the following code will print {{Snapshottable 
directories:}} instead of {{Snapshottable directories:0}}

{code}
+div class=page-headerh1smallSnapshottable directories: 
{SnapshotStats.directory|array_length}/small/div
{code}

I guess for this version let's just remove {{array_length}} and

{code}
+div class=page-headerh1smallSnapshottable directories: 
{SnapshotStats.directory|array_length}/small/div
+div class=page-headerh1smallSnapshotted directories: 
{SnapshotStats.snapshots|array_length}/small/div
{code}

Let's address it in a separate jira.

{code}
+var HELPERS = {
+  'helper_to_permission': function (chunk, ctx, bodies, params) {
+var p = ctx.current().permission;
+var symbols = [ '---', '--x', '-w-', '-wx', 'r--', 'r-x', 'rw-', 'rwx' 
];
+var sticky = p  1000;
+
+var res = ;
+res = symbols[(p  6)  7] + symbols[(p  3)  7] + symbols[p  7];
+
+if (sticky) {
+  var otherExec = ((ctx.current().permission % 10)  1) == 1;
+  res = res.substr(0, res.length - 1) + (otherExec ? 't' : 'T');
+}
+
+chunk.write('d' + res);
+return chunk;
+  }
+};
+
{code}

You can move it to the filter object in {{dfs-dust.js}} and remove the 
duplicated one in {{explorer.js}}.
Nit: there are some trailing white spaces.

 Provide more snapshot information in WebUI
 --

 Key: HDFS-5196
 URL: https://issues.apache.org/jira/browse/HDFS-5196
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 3.0.0
Reporter: Haohui Mai
Assignee: Shinichi Yamashita
Priority: Minor
 Attachments: HDFS-5196-2.patch, HDFS-5196-3.patch, HDFS-5196-4.patch, 
 HDFS-5196-5.patch, HDFS-5196-6.patch, HDFS-5196-7.patch, HDFS-5196-8.patch, 
 HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, 
 snapshottable-directoryList.png, snapshotteddir.png


 The WebUI should provide more detailed information about snapshots, such as 
 all snapshottable directories and corresponding number of snapshots 
 (suggested in HDFS-4096).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API

[
https://issues.apache.org/jira/browse/HDFS-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945410#comment-13945410
]

Haohui Mai commented on HDFS-5978:
--

bq. Just to be clear, this is an WebHDFS operation over HTTP, it is not part of
the WebHDFS FileSystem HTTP REST API, right?

I'm not exactly sure what you mean. The patch provides an offline image viewer
that creates a HTTP server and exposes the same APIs of WebHDFS, so that the
users can use other tools (like {{hadoop dfs -ls}} or the web UI) to inspect
the fsimage. Other than that it has nothing to do with WebHDFS.

Thanks for bringing it up. However, it might be better to update the
description to avoid the confusion.

Create a tool to take fsimage and expose read-only WebHDFS API
--

Key: HDFS-5978
URL: https://issues.apache.org/jira/browse/HDFS-5978
Project: Hadoop HDFS
Issue Type: Sub-task
Components: tools
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Labels: newbie
Attachments: HDFS-5978.2.patch, HDFS-5978.3.patch, HDFS-5978.4.patch,
HDFS-5978.patch

Suggested in HDFS-5975.
Add an option to exposes the read-only version of WebHDFS API for
OfflineImageViewer. You can imagine it looks very similar to jhat.
That way we can allow the operator to use the existing command-line tool, or
even the web UI to debug the fsimage. It also allows the operator to
interactively browsing the file system, figuring out what goes wrong.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6149) Running UTs with testKerberos profile has failures.

Jinghui Wang created HDFS-6149:
--

 Summary: Running UTs with testKerberos profile has failures.
 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Fix For: 2.3.0


UT failures in TestHttpFSWithKerberos.
Tests using testDelegationTokenWithinDoAs fail because of the statically set 
keytab file.
Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
that CANCELDELEGATIONTOKEN does require credentials.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency


[ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945431#comment-13945431
 ] 

Hadoop QA commented on HDFS-5846:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636341/hdfs-5846.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDistributedFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6475//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6475//console

This message is automatically generated.

 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6149) Running UTs with testKerberos profile has failures.


 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---

Description: 
UT failures in TestHttpFSWithKerberos.
Tests using testDelegationTokenWithinDoAs fail because of the statically set 
keytab file.
Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
that CANCELDELEGATIONTOKEN does not require credentials.

  was:
UT failures in TestHttpFSWithKerberos.
Tests using testDelegationTokenWithinDoAs fail because of the statically set 
keytab file.
Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
that CANCELDELEGATIONTOKEN does require credentials.


 Running UTs with testKerberos profile has failures.
 ---

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Fix For: 2.3.0


 UT failures in TestHttpFSWithKerberos.
 Tests using testDelegationTokenWithinDoAs fail because of the statically set 
 keytab file.
 Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
 that CANCELDELEGATIONTOKEN does not require credentials.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6149) Running UTs with testKerberos profile has failures.


[ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945449#comment-13945449
 ] 

Jinghui Wang commented on HDFS-6149:


The test that cancels delegation token in testDelegationTokenHttpFSAccess 
assumes the operation does not require credentials. However the following block 
of code in HttpFSKerberosAuthenticationHandler shows that if there is no token, 
then the CANCELDELEGATIONTOKEN operation is not performed. Thus causing an 
assertion error as the the test expects a response code of 200 but getting 401 
instead.

else if (dtOp.requiresKerberosCredentials()  token == null) {
  response.sendError(HttpServletResponse.SC_UNAUTHORIZED,
MessageFormat.format(
  Operation [{0}] requires SPNEGO authentication established,
  dtOp));
  requestContinues = false;
}

 Running UTs with testKerberos profile has failures.
 ---

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Fix For: 2.3.0


 UT failures in TestHttpFSWithKerberos.
 Tests using testDelegationTokenWithinDoAs fail because of the statically set 
 keytab file.
 Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
 that CANCELDELEGATIONTOKEN does not require credentials.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6148) LeaseManager crashes while initiating block recovery


[ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945473#comment-13945473
 ] 

Kihwal Lee commented on HDFS-6148:
--

It may have something to do with loading fsimage + edits and processing 
under-construction files.  LeaseManager crashes one hour after NN start-up.

 LeaseManager crashes while initiating block recovery
 

 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker

 While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
 always happen on block recovery.
 {panel}
 Exception in thread
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$
 
 ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
 initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
 at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency


 [ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5846:


Attachment: hdfs-5846.patch

+1 for the patch.

The test failure looks unrelated to me.  I couldn't reproduce it locally.  I'm 
re-uploading the same patch just to kick off another Jenkins run to confirm.

 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency


 [ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5846:


  Component/s: namenode
 Target Version/s: 3.0.0, 2.4.0
Affects Version/s: 3.0.0
   2.3.0
 Hadoop Flags: Reviewed

 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6148) LeaseManager crashes while initiating block recovery


[ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945426#comment-13945426
 ] 

Kihwal Lee commented on HDFS-6148:
--

replicas.size() was non-zero and there was a corresponding 
ReplicaUnderConstruction, but its expectedLocation seemed to be null.  This can 
happen if setExpectedStorageLocations() were called with array of nulls.  This 
might happen if a last block with null locations is turned into a 
BlockInfoUnderConstruction.   There might be other ways though.

 LeaseManager crashes while initiating block recovery
 

 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker

 While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
 always happen on block recovery.
 {panel}
 Exception in thread
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$
 
 ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
 initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
 at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6149) Running UTs with testKerberos profile has failures.


 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---

Attachment: HDFS-6149.patch

 Running UTs with testKerberos profile has failures.
 ---

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Fix For: 2.3.0

 Attachments: HDFS-6149.patch


 UT failures in TestHttpFSWithKerberos.
 Tests using testDelegationTokenWithinDoAs fail because of the statically set 
 keytab file.
 Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
 that CANCELDELEGATIONTOKEN does not require credentials.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6148) LeaseManager crashes while initiating block recovery


[ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945530#comment-13945530
 ] 

Kihwal Lee commented on HDFS-6148:
--

Sorry, it was seen on a 2.3 cluster. I will verify wether we still have this 
bug in 2.4

 LeaseManager crashes while initiating block recovery
 

 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker

 While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
 always happen on block recovery.
 {panel}
 Exception in thread
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$
 
 ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
 initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
 at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6148) LeaseManager crashes while initiating block recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6148:
-

Affects Version/s: (was: 2.4.0)
   2.3.0

 LeaseManager crashes while initiating block recovery
 

 Key: HDFS-6148
 URL: https://issues.apache.org/jira/browse/HDFS-6148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Kihwal Lee
Priority: Blocker

 While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
 always happen on block recovery.
 {panel}
 Exception in thread
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$
 
 ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
 initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
 at 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
 at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945560#comment-13945560
]

Tsz Wo Nicholas Sze commented on HDFS-5138:
---

+1 the branch-2 patch looks good.

Support HDFS upgrade in HA
--

Key: HDFS-5138
URL: https://issues.apache.org/jira/browse/HDFS-5138
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
Fix For: 3.0.0

Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt

With HA enabled, NN wo't start with -upgrade. Since there has been a layout
version change between 2.0.x and 2.1.x, starting NN in upgrade mode was
necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way
to get around this was to disable HA and upgrade.
The NN and the cluster cannot be flipped back to HA until the upgrade is
finalized. If HA is disabled only on NN for layout upgrade and HA is turned
back on without involving DNs, things will work, but finaliizeUpgrade won't
work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade
snapshots won't get removed.
We will need a different ways of doing layout upgrade and upgrade snapshot.
I am marking this as a 2.1.1-beta blocker based on feedback from others. If
there is a reasonable workaround that does not increase maintenance window
greatly, we can lower its priority from blocker to critical.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-03-24 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945576#comment-13945576
]

Alejandro Abdelnur commented on HDFS-6134:
--

(Cross-posting HADOOP-10150 HDFS-6134]

[~avik_...@yahoo.com], I’ve just looked at the MAR/21 proposal in HADOOP-10150
(the patches uploaded on MAR/21 do not apply on trunk cleanly, so I cannot look
at them easily. It seems to have missing pieces, like getXAttrs() and wiring to
KeyProvider API. Would be possible to rebased them so they apply to trunk?)

bq. do we need a new proposal for the work already being done on HADOOP-10150?

HADOOP-10150 aims to provide encryption for any filesystem implementation as a
decorator filesystem. While HDFS-6134 aims to provide encryption for HDFS.

The 2 approaches differ on the level of transparency you get. The comparison
table in the HDFS Data at Rest Encryption attachment
(https://issues.apache.org/jira/secure/attachment/12635964/HDFSDataAtRestEncryption.pdf)
highlights the differences.

Particularly, the things I’m concerned the most with HADOOP-10150 are:

* All clients (doing encryption/decryption) must have access the key management
service.
* Secure key propagation to tasks running in the cluster (i.e. mapper and
reducer tasks)
* Use of AES-CTR (instead of an authenticated encryption mode such as AES-GCM)
* Not clear how hflush()

bq. are there design choices in this proposal that are superior to the patch
already provided on HADOOP-10150?

IMO, a consolidated access/distribution of keys by the NN (as opposed to every
client) improves the security of the system.

bq. do you have additional requirement listed in this JIRA that could be
incorporated in to HADOOP-10150,

They are enumerated in the HDFS Data at Rest Encryption attachment. The ones
I don’t see them address in HADOOP-10150 are: #6, #8.A. And it is not clear how
#4 #5 can be achieved.

bq. so we can collaborate and not duplicate?

Definitely, I want to work together with you guys to leverage as much as
posible. Either by unifying the 2 proposal or by sharing common code if we
think both approaches have merits and we decide to move forward with both.

Happy to jump on a call to discuss things and the report back to the community
if you think that will speed up the discussion.

--
By looking at the latest design doc of HADOOP-10150 I can see that things have
been modified a bit (from the original design doc) bringing it a bit closer to
some of the HDFS-6134 requirements.

Still, it is not clear how transparency will be achieved for existing
applications: HDFS URI changes, clients must connect to the Key store to
retrieve the encryption key (clients will need key store principals). The
encryption key must be propagated to jobs tasks (i.e. Mapper/Reducer processes)

Requirement #4 Can decorate HDFS and all other file systems in Hadoop, and
will not modify existing structure of file system, such as namenode and
datanode structure if the wrapped file system is HDFS. This is contradicted by
the design, in the Storage of IV and data key is stated So we implement
extended information based on INode feature, and use it to store data key and
IV.

Requirement #5 Admin can configure encryption policies, such as which
directory will be encrypted., this seems driven by HDFS client configuration
file (hdfs-site.xml). This is not really admin driven as clients could break
this by configuring their hdfs-site.xml file)

Restrictions of move operations for files within an encrypted directory. The
original design had something about it (not entirely correct), now is gone.

(Mentioned before), how thing flush() operations will be handled as the
encryption block will be cut short? How this is handled on writes? How this is
handled on reads?

Explicit auditing on encrypted files access does not seem handled.

Transparent data at rest encryption
---

Key: HDFS-6134
URL: https://issues.apache.org/jira/browse/HDFS-6134
Project: Hadoop HDFS
Issue Type: New Feature
Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Attachments: HDFSDataAtRestEncryption.pdf

Because of privacy and security regulations, for many industries, sensitive
data at rest must be in encrypted form. For example: the healthcare industry
(HIPAA regulations), the card payment industry (PCI DSS regulations) or the
US government (FISMA regulations).
This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can
be used transparently by any application accessing HDFS via Hadoop Filesystem
Java API, Hadoop libhdfs C library, or WebHDFS REST API.
The resulting implementation should be able to be used in compliance with
different regulation

[jira] [Commented] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back


[ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945614#comment-13945614
 ] 

Hadoop QA commented on HDFS-6135:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636293/HDFS-6135.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6476//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6476//console

This message is automatically generated.

 In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
 when rolling back
 --

 Key: HDFS-6135
 URL: https://issues.apache.org/jira/browse/HDFS-6135
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Blocker
 Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, 
 HDFS-6135.002.patch, HDFS-6135.test.txt


 While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
 the upgrade, the rollback may trigger the following exception in JournalNodes 
 (suppose the new software bumped the layoutversion from -55 to -56):
 {code}
 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
 roll back possible for one or more JournalNodes. 1 exceptions thrown:
 Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
 Reported: -56. Expecting = -55.
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73)
   at 
 org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
 {code}
 Looks like for rollback JN with old software cannot handle future 
 layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945628#comment-13945628
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6130:
---

 Apache release also has this issue. Apache 1.0.4 upgrade to the trunk, you 
 can reproduce this issue.

Hi Fengdong, I just have tried it but cannot reproduce the NPE.  There were a 
log of changes since Apache 1.0.4.  I was using 1.3.0 in my test.  Could you 
also try it?

 NPE during namenode upgrade from old release
 

 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
 I can upgrade successfully if I don't configurage HA, but if HA enabled,
 there is NPE when I run ' hdfs namenode -initializeSharedEdits'
 {code}
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
 enabled
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
 total heap and retry cache entry expiry time is 60 millis
 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
 NameNodeRetryCache
 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
 275.3 KB
 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
 14/03/20 15:06:41 INFO common.Storage: Lock on 
 /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO common.Storage: Lock on 
 /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945642#comment-13945642
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6130:
---

 I believe that this is a duplicate of HDFS-5988.

Hi [~wheat9], the stack trace posted here is indeed different from the one 
posted in HDFS-6021 (a dup of HDFS-5988).  So it seems that this is a different 
issue.  In this bug, FSImageFormatPBINode somehow passes a null inode to 
FSDirectory.  Could you take a look?

- Stack trace posted here 

{noformat}
14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
...
{noformat}

- Stack trace posted in HDFS-6021 (a dup of HDFS-5988)

{noformat}
2014-02-26 17:03:11,755 FATAL [main] namenode.NameNode 
(NameNode.java:main(1351)) - Exception in namenode join
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:169)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:225)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:802)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:792)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:624)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:593)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:331)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:251)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:641)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:647)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:632)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1280)
...
{noformat}


 NPE during namenode upgrade from old release
 

 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
 I can upgrade successfully if I don't configurage HA, but if HA enabled,
 there is NPE when I run ' hdfs namenode -initializeSharedEdits'
 {code}
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
 enabled
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
 total heap and retry cache entry expiry time is 60 millis
 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
 NameNodeRetryCache
 14/03/20 15:06:41 INFO util.GSet: VM

[jira] [Created] (HDFS-6150) Add inode id information in the logs to make debugging easier

Suresh Srinivas created HDFS-6150:
-

 Summary: Add inode id information in the logs to make debugging 
easier
 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.patch

Inode information and path information are missing in the logs and exceptions. 
Adding this will help debug multi threading issues related to using incode 
INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier


 [ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-6150:
--

Attachment: HDFS-6150.patch

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier


 [ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-6150:
--

Status: Patch Available  (was: Open)

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back


 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6135:
--

 Component/s: journal-node
Hadoop Flags: Reviewed

+1 patch looks good.

 In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
 when rolling back
 --

 Key: HDFS-6135
 URL: https://issues.apache.org/jira/browse/HDFS-6135
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Blocker
 Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, 
 HDFS-6135.002.patch, HDFS-6135.test.txt


 While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
 the upgrade, the rollback may trigger the following exception in JournalNodes 
 (suppose the new software bumped the layoutversion from -55 to -56):
 {code}
 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
 roll back possible for one or more JournalNodes. 1 exceptions thrown:
 Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
 Reported: -56. Expecting = -55.
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73)
   at 
 org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
 {code}
 Looks like for rollback JN with old software cannot handle future 
 layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945653#comment-13945653
 ] 

Haohui Mai commented on HDFS-6130:
--

It would be very helpful if the corresponding fsimage is available.

 NPE during namenode upgrade from old release
 

 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
 I can upgrade successfully if I don't configurage HA, but if HA enabled,
 there is NPE when I run ' hdfs namenode -initializeSharedEdits'
 {code}
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
 enabled
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
 total heap and retry cache entry expiry time is 60 millis
 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
 NameNodeRetryCache
 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
 275.3 KB
 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
 14/03/20 15:06:41 INFO common.Storage: Lock on 
 /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO common.Storage: Lock on 
 /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5910) Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text


[ 
https://issues.apache.org/jira/browse/HDFS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945654#comment-13945654
 ] 

Suresh Srinivas commented on HDFS-5910:
---

[~benoyantony], any updates on this?

 Enhance DataTransferProtocol to allow per-connection choice of 
 encryption/plain-text
 

 Key: HDFS-5910
 URL: https://issues.apache.org/jira/browse/HDFS-5910
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.2.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-5910.patch, HDFS-5910.patch


 It is possible to enable encryption of DataTransferProtocol. 
 In some use cases, it is required to encrypt data transfer with some clients 
 , but communicate in plain text with some other clients and data nodes.
 A sample use case will be that any data transfer inside a firewall can be in 
 plain text whereas any data transfer from clients  outside the firewall needs 
 to be encrypted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6050) NFS does not handle exceptions correctly in a few places


 [ 
https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6050:
-

Summary: NFS does not handle exceptions correctly in a few places  (was: 
NFS OpenFileCtx does not handle exceptions correctly)

 NFS does not handle exceptions correctly in a few places
 

 Key: HDFS-6050
 URL: https://issues.apache.org/jira/browse/HDFS-6050
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brock Noland
Assignee: Brandon Li
 Attachments: HDFS-6050.002.patch, HDFS-6050.patch


 I noticed this file does not log exceptions appropriately in multiple 
 locations.
 Not logging the stack of Throwable:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364
 Printing exceptions to stderr:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149
 Not logging the stack trace:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6151) HDFS should refuse to cache blocks =2GB

2014-03-24 Thread Andrew Wang (JIRA)

Andrew Wang created HDFS-6151:
-

 Summary: HDFS should refuse to cache blocks =2GB
 Key: HDFS-6151
 URL: https://issues.apache.org/jira/browse/HDFS-6151
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching, datanode
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang


If you try to cache a block that's =2GB, the DN will silently fail to cache it 
since {{MappedByteBuffer}} uses a signed int to represent size. Blocks this 
large are rare, but we should log or alert the user somehow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6050) NFS does not handle exceptions correctly in a few places


[ 
https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945674#comment-13945674
 ] 

Brandon Li commented on HDFS-6050:
--

Thank you, Haohui, for the review. I've committed the patch.

 NFS does not handle exceptions correctly in a few places
 

 Key: HDFS-6050
 URL: https://issues.apache.org/jira/browse/HDFS-6050
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brock Noland
Assignee: Brandon Li
 Attachments: HDFS-6050.002.patch, HDFS-6050.patch


 I noticed this file does not log exceptions appropriately in multiple 
 locations.
 Not logging the stack of Throwable:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364
 Printing exceptions to stderr:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149
 Not logging the stack trace:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6050) NFS does not handle exceptions correctly in a few places


[ 
https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945673#comment-13945673
 ] 

Brandon Li commented on HDFS-6050:
--

{quote}
preOpAttr is always null here
{quote}
This may not be true later so I think it's better to keep preOpAttr.

 NFS does not handle exceptions correctly in a few places
 

 Key: HDFS-6050
 URL: https://issues.apache.org/jira/browse/HDFS-6050
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brock Noland
Assignee: Brandon Li
 Attachments: HDFS-6050.002.patch, HDFS-6050.patch


 I noticed this file does not log exceptions appropriately in multiple 
 locations.
 Not logging the stack of Throwable:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364
 Printing exceptions to stderr:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149
 Not logging the stack trace:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


 [ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5840:


Attachment: HDFS-5840.001.patch

Rebase [~atm]'s patch. Also add storage recovery logic for JN based on the 
discussion in the comments.

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 3.0.0

 Attachments: HDFS-5840.001.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6050) NFS does not handle exceptions correctly in a few places


 [ 
https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6050:
-

Issue Type: Improvement  (was: Bug)

 NFS does not handle exceptions correctly in a few places
 

 Key: HDFS-6050
 URL: https://issues.apache.org/jira/browse/HDFS-6050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Brock Noland
Assignee: Brandon Li
 Attachments: HDFS-6050.002.patch, HDFS-6050.patch


 I noticed this file does not log exceptions appropriately in multiple 
 locations.
 Not logging the stack of Throwable:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364
 Printing exceptions to stderr:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149
 Not logging the stack trace:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6050) NFS does not handle exceptions correctly in a few places


 [ 
https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6050:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 NFS does not handle exceptions correctly in a few places
 

 Key: HDFS-6050
 URL: https://issues.apache.org/jira/browse/HDFS-6050
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brock Noland
Assignee: Brandon Li
 Attachments: HDFS-6050.002.patch, HDFS-6050.patch


 I noticed this file does not log exceptions appropriately in multiple 
 locations.
 Not logging the stack of Throwable:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364
 Printing exceptions to stderr:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149
 Not logging the stack trace:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6050) NFS does not handle exceptions correctly in a few places


[ 
https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945680#comment-13945680
 ] 

Hudson commented on HDFS-6050:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5391 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5391/])
HDFS-6050. NFS does not handle exceptions correctly in a few places. 
Contributed by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581055)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3FileAttributes.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/Portmap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/mount/RpcProgramMountd.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/AsyncDataService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS does not handle exceptions correctly in a few places
 

 Key: HDFS-6050
 URL: https://issues.apache.org/jira/browse/HDFS-6050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Brock Noland
Assignee: Brandon Li
 Attachments: HDFS-6050.002.patch, HDFS-6050.patch


 I noticed this file does not log exceptions appropriately in multiple 
 locations.
 Not logging the stack of Throwable:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364
 Printing exceptions to stderr:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149
 Not logging the stack trace:
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961
 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945683#comment-13945683
 ] 

Suresh Srinivas commented on HDFS-5840:
---

[~jingzhao], thanks for doing the patch.

Should JN throw an exception, if pre-upgrade or upgrade is retried, that 
upgrade is already in progress, restart the journal node before attempting 
namenode upgrade again?

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 3.0.0

 Attachments: HDFS-5840.001.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945688#comment-13945688
]

Jing Zhao commented on HDFS-5138:
-

Thanks for the quick review, Nicholas! I will commit the backport patch to
branch-2 and 2.4. We can continue to fix remaining issues in HDFS-5840 and
HDFS-6135.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6149) Running Httpfs UTs with testKerberos profile has failures.


 [ 
https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6149:
---

Summary: Running Httpfs UTs with testKerberos profile has failures.  (was: 
Running UTs with testKerberos profile has failures.)

 Running Httpfs UTs with testKerberos profile has failures.
 --

 Key: HDFS-6149
 URL: https://issues.apache.org/jira/browse/HDFS-6149
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Fix For: 2.3.0

 Attachments: HDFS-6149.patch


 UT failures in TestHttpFSWithKerberos.
 Tests using testDelegationTokenWithinDoAs fail because of the statically set 
 keytab file.
 Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption 
 that CANCELDELEGATIONTOKEN does not require credentials.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency


[ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945710#comment-13945710
 ] 

Hadoop QA commented on HDFS-5846:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636402/hdfs-5846.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6477//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6477//console

This message is automatically generated.

 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao resolved HDFS-5138.
-

Resolution: Fixed
Fix Version/s: 2.4.0

I've merged this to branch-2 and branch-2.4.0.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back


 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6135:


   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

Thanks for the review, Nicholas! I've committed this to trunk, branch-2, and 
branch-2.4.0.

 In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
 when rolling back
 --

 Key: HDFS-6135
 URL: https://issues.apache.org/jira/browse/HDFS-6135
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Blocker
 Fix For: 2.4.0

 Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, 
 HDFS-6135.002.patch, HDFS-6135.test.txt


 While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
 the upgrade, the rollback may trigger the following exception in JournalNodes 
 (suppose the new software bumped the layoutversion from -55 to -56):
 {code}
 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
 roll back possible for one or more JournalNodes. 1 exceptions thrown:
 Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
 Reported: -56. Expecting = -55.
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73)
   at 
 org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
 {code}
 Looks like for rollback JN with old software cannot handle future 
 layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945728#comment-13945728
 ] 

Jing Zhao commented on HDFS-5840:
-

Thanks for the comments, Suresh! Will update the patch to address the comments.

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 3.0.0

 Attachments: HDFS-5840.001.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2

2014-03-24 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reopened HDFS-5807:
-


[~airbots], I found this test failing again in our nightly builds, Can you take 
a look into it again? 

{noformat}
Error Message

Rebalancing expected avg utilization to become 0.16, but on datanode 
X.X.X.X: it remains at 0.3 after more than 4 msec.

Stacktrace

java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to 
become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 
4 msec.
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)

{noformat}

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on 
 Branch-2
 

 Key: HDFS-5807
 URL: https://issues.apache.org/jira/browse/HDFS-5807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5807.patch


 The test times out after some time.
 {noformat}
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more 
 than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA


[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945748#comment-13945748
 ] 

Hudson commented on HDFS-5138:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5392 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5392/])
Move HDFS-5138 to 2.4.0 section in CHANGES.txt (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581074)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Support HDFS upgrade in HA
 --

 Key: HDFS-5138
 URL: https://issues.apache.org/jira/browse/HDFS-5138
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt


 With HA enabled, NN wo't start with -upgrade. Since there has been a layout 
 version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
 necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
 to get around this was to disable HA and upgrade. 
 The NN and the cluster cannot be flipped back to HA until the upgrade is 
 finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
 back on without involving DNs, things will work, but finaliizeUpgrade won't 
 work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
 snapshots won't get removed.
 We will need a different ways of doing layout upgrade and upgrade snapshot.  
 I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
 there is a reasonable workaround that does not increase maintenance window 
 greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back


[ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945749#comment-13945749
 ] 

Hudson commented on HDFS-6135:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5392 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5392/])
HDFS-6135. In HDFS upgrade with HA setup, JournalNode cannot handle layout 
version bump when rolling back. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581070)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java


 In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
 when rolling back
 --

 Key: HDFS-6135
 URL: https://issues.apache.org/jira/browse/HDFS-6135
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Blocker
 Fix For: 2.4.0

 Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, 
 HDFS-6135.002.patch, HDFS-6135.test.txt


 While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
 the upgrade, the rollback may trigger the following exception in JournalNodes 
 (suppose the new software bumped the layoutversion from -55 to -56):
 {code}
 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
 roll back possible for one or more JournalNodes. 1 exceptions thrown:
 Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
 Reported: -56. Expecting = -55.
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
   at 
 org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73)
   at 
 org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
 {code}
 Looks like for rollback JN with old software cannot handle future 
 layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency


[ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945782#comment-13945782
 ] 

Hudson commented on HDFS-5846:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5393 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5393/])
HDFS-5846. Shuffle phase is slow in Windows - FadviseFileRegion::transferTo 
does not read disks efficiently. Contributed by Nikola Vujic. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581091)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnresolvedTopologyException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java


 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2

2014-03-24 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945788#comment-13945788
 ] 

Chen He commented on HDFS-5807:
---

Working on it.

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on 
 Branch-2
 

 Key: HDFS-5807
 URL: https://issues.apache.org/jira/browse/HDFS-5807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5807.patch


 The test times out after some time.
 {noformat}
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more 
 than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-03-24 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945790#comment-13945790
 ] 

Larry McCay commented on HDFS-6134:
---

Hi [~tucu00] - I like what I see here. We should file jira's for the 
KeyProvider API work that you mention in your document and discuss some of 
those aspects there. We have a number of common interests in that area.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataAtRestEncryption.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the healthcare industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency


 [ 
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5846:


   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

I committed this to trunk, branch-2 and branch-2.4.  Nikola, thank you for 
contributing this code.

 Assigning DEFAULT_RACK in resolveNetworkLocation method can break data 
 resiliency
 -

 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Nikola Vujic
Assignee: Nikola Vujic
 Fix For: 3.0.0, 2.4.0

 Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch


 Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
 careful handling. Null can be returned in two cases:
 • An error occurred with topology script execution (script crashes).
 • Script returns wrong number of values (other than expected)
 Critical handling is in the DN registration code. DN registration code is 
 responsible for assigning proper topology paths to all registered datanodes. 
 Existing code handles this NULL pointer on the following way 
 ({{resolveNetworkLocation}} method):
 {code}
 / /resolve its network location
 ListString rName = dnsToSwitchMapping.resolve(names);
 String networkLocation;
 if (rName == null) {
   LOG.error(The resolve call returned null! Using  + 
   NetworkTopology.DEFAULT_RACK +  for host  + names);
   networkLocation = NetworkTopology.DEFAULT_RACK;
 } else {
   networkLocation = rName.get(0);
 }
 return networkLocation;
 {code}
 The line of code that is assigning default rack:
 {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
 can cause a serious problem. This means if somehow we got NULL, then the 
 default rack will be assigned as a DN's network location and DN's 
 registration will finish successfully. Under this circumstances, we will be 
 able to load data into cluster which is working with a wrong topology. Wrong  
 topology means that fault domains are not honored. 
 For the end user, it means that two data replicas can end up in the same 
 fault domain and a single failure can cause loss of two, or more, replicas. 
 Cluster would be in the inconsistent state but it would not be aware of that 
 and the whole thing would work as if everything was fine. We can notice that 
 something wrong happened almost only by looking in the log for the error:
 {code}
 LOG.error(The resolve call returned null! Using  + 
 NetworkTopology.DEFAULT_RACK +  for host  + names);
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6125) Cleanup unnecessary cast in HDFS code base


[ 
https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945811#comment-13945811
 ] 

Chris Nauroth commented on HDFS-6125:
-

Thanks for doing these clean-ups, Suresh.  I can start code reviewing this.

 Cleanup unnecessary cast in HDFS code base
 --

 Key: HDFS-6125
 URL: https://issues.apache.org/jira/browse/HDFS-6125
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFS-6125.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6150) Add inode id information in the logs to make debugging easier


[ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945823#comment-13945823
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6150:
---

In the existing code, there are two places below that it says Failed to ...  
+ src +  on client  + clientMachine.  It seems saying the src is on the 
client machine.  It should be for the client.  Could you also fix it as well?
{code}
@@ -2292,8 +2292,8 @@ private void startFileInternal(FSPermissionChecker pc, 
String src,
 try {
   if (myFile == null) {
 if (!create) {
-  throw new FileNotFoundException(failed to overwrite non-existent 
file 
-+ src +  on client  + clientMachine);
+  throw new FileNotFoundException(Can't overwrite non-existent  +
+  src +  on client  + clientMachine);
 }
   } else {
 if (overwrite) {
@@ -2306,8 +2306,8 @@ private void startFileInternal(FSPermissionChecker pc, 
String src,
 } else {
   // If lease soft limit time is expired, recover the lease
   recoverLeaseInternal(myFile, src, holder, clientMachine, false);
-  throw new FileAlreadyExistsException(failed to create file  + src
-  +  on client  + clientMachine +  because the file exists);
+  throw new FileAlreadyExistsException(src +  on client  +
+  clientMachine +  already exists);
 }
   }
{code}


 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6125) Cleanup unnecessary cast in HDFS code base


 [ 
https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6125:


Attachment: HDFS-6125.2.patch

{{FSEditLogOp}} needed a very minor rebase to apply on current trunk.  Instead 
of going back and forth with comments, I just made the change locally, and I'm 
uploading the new patch.

I'm +1 for patch v2, pending Jenkins.  Thanks again, Suresh.

 Cleanup unnecessary cast in HDFS code base
 --

 Key: HDFS-6125
 URL: https://issues.apache.org/jira/browse/HDFS-6125
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFS-6125.2.patch, HDFS-6125.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-03-24 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945828#comment-13945828
 ] 

Alejandro Abdelnur commented on HDFS-6134:
--

[~lmccay], gr8, I have done some work already on this area while prototyping, 
I'll create a few JIRAs later tonight and put up patches for the stuff I 
already have.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataAtRestEncryption.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the healthcare industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6125) Cleanup unnecessary cast in HDFS code base


 [ 
https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6125:


  Component/s: test
 Target Version/s: 3.0.0, 2.5.0
Affects Version/s: 2.4.0
 Hadoop Flags: Reviewed

 Cleanup unnecessary cast in HDFS code base
 --

 Key: HDFS-6125
 URL: https://issues.apache.org/jira/browse/HDFS-6125
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 2.4.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFS-6125.2.patch, HDFS-6125.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6150) Add inode id information in the logs to make debugging easier


[ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945840#comment-13945840
 ] 

Hadoop QA commented on HDFS-6150:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636419/HDFS-6150.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestFileCreation
  org.apache.hadoop.hdfs.TestLeaseRecovery2

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6478//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6478//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6478//console

This message is automatically generated.

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6124) Add final modifier to class members

2014-03-24 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945865#comment-13945865
 ] 

Arpit Agarwal commented on HDFS-6124:
-

+1 for the patch. I will commit it shortly.

 Add final modifier to class members
 ---

 Key: HDFS-6124
 URL: https://issues.apache.org/jira/browse/HDFS-6124
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFS-6124.1.patch, HDFS-6124.patch


 Many of the member variables declaration for classes are missing final 
 modifier in HDFS. This jira adds final modifier where possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified

Yongjun Zhang created HDFS-6152:
---

 Summary: distcp V2 doesn't preserve root dir's attributes when -p 
is specified
 Key: HDFS-6152
 URL: https://issues.apache.org/jira/browse/HDFS-6152
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang


Two issues were observed with distcpV2

ISSUE 1. when copying a source dir to target dir with -pu option using 
command 

  distcp -pu source-dir target-dir
 
The source dir's owner is not preserved at target dir. Simiarly other 
attributes of source dir are not preserved.  Supposedly they should be 
preserved when no -update and no -overwrite specified. 

There are two scenarios with the above command:

a. when target-dir already exists. Issuing the above command will  result in 
target-dir/source-dir (source-dir here refers to the last component of the 
source-dir path in the command line) at target file system, with all contents 
in source-dir copied to under target-dir/src-dir. The issue in this case is, 
the attributes of src-dir is not preserved.

b. when target-dir doesn't exist. It will result in target-dir with all 
contents of source-dir copied to under target-dir. This issue in this  case is, 
the attributes of source-dir is not carried over to target-dir.

For multiple source cases, e.g., command 

  distcp -pu source-dir1 source-dir2 target-dir

No matter whether the target-dir exists or not, the multiple sources are copied 
to under the target dir (target-dir is created if it didn't exist). And their 
attributes are preserved. 

ISSUE 2. with the following command:

distcp source-dir target-dir

when source-dir is an empty directory, and when target-dir doesn't exist, 
source-dir is not copied, actually the command behaves like a no-op. 

However, when the source-dir is not empty, it would be copied and results in 
target-dir at the target file system containing a copy of source-dir's children.

To be consistent, empty source dir should be copied too. Basically the  above 
distcp command should cause target-dir get created at target file 
system, and the source-dir's attributes are preserved at target-dir when 
-p is passed.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified

[
https://issues.apache.org/jira/browse/HDFS-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yongjun Zhang reassigned HDFS-6152:
---

Assignee: Yongjun Zhang

distcp V2 doesn't preserve root dir's attributes when -p is specified
-

Key: HDFS-6152
URL: https://issues.apache.org/jira/browse/HDFS-6152
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

Two issues were observed with distcpV2
ISSUE 1. when copying a source dir to target dir with -pu option using
command
distcp -pu source-dir target-dir

The source dir's owner is not preserved at target dir. Simiarly other
attributes of source dir are not preserved. Supposedly they should be
preserved when no -update and no -overwrite specified.
There are two scenarios with the above command:
a. when target-dir already exists. Issuing the above command will result in
target-dir/source-dir (source-dir here refers to the last component of the
source-dir path in the command line) at target file system, with all contents
in source-dir copied to under target-dir/src-dir. The issue in this case is,
the attributes of src-dir is not preserved.
b. when target-dir doesn't exist. It will result in target-dir with all
contents of source-dir copied to under target-dir. This issue in this case
is, the attributes of source-dir is not carried over to target-dir.
For multiple source cases, e.g., command
distcp -pu source-dir1 source-dir2 target-dir
No matter whether the target-dir exists or not, the multiple sources are
copied to under the target dir (target-dir is created if it didn't exist).
And their attributes are preserved.
ISSUE 2. with the following command:
distcp source-dir target-dir
when source-dir is an empty directory, and when target-dir doesn't exist,
source-dir is not copied, actually the command behaves like a no-op.
However, when the source-dir is not empty, it would be copied and results in
target-dir at the target file system containing a copy of source-dir's
children.
To be consistent, empty source dir should be copied too. Basically the above
distcp command should cause target-dir get created at target file
system, and the source-dir's attributes are preserved at target-dir when
-p is passed.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified

[
https://issues.apache.org/jira/browse/HDFS-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yongjun Zhang updated HDFS-6152:

Attachment: HDFS-6152.001.patch

distcp V2 doesn't preserve root dir's attributes when -p is specified
-

Two issues were observed with distcpV2
ISSUE 1. when copying a source dir to target dir with -pu option using
command
distcp -pu source-dir target-dir

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

[
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-5840:

Attachment: HDFS-5840.002.patch

Update the patch to address Suresh's comments. The patch also switch the
sequence of the doUpgrade calls on shared edits and on local storage.

Now with the patch we have the following:
# If the doPreUpgrade call on JNs fail (i.e., not all the JNs succeed), we can
restart NN and JNs for recovery, and NN and JNs will go back to the status
before upgrade.
# If the doUpgrade call on JNs fail, some JNs may have both previous and
current directories. Restarting JN cannot solve the issue. The issue has to be
manually fixed. But the probability of this kind of failure is relatively low
considering the doPreUpgrade call succeeds on all the JNs.

Follow-up to HDFS-5138 to improve error handling during partial upgrade
failures

Key: HDFS-5840
URL: https://issues.apache.org/jira/browse/HDFS-5840
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Blocker
Fix For: 3.0.0

Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch

Suresh posted some good comment in HDFS-5138 after that patch had already
been committed to trunk. This JIRA is to address those. See the first comment
of this JIRA for the full content of the review.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5910) Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text

2014-03-24 Thread Benoy Antony (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benoy Antony updated HDFS-5910:
---

Attachment: HDFS-5910.patch

Thanks [~arpitagarwal] for the comments.

I am attaching a new based on some of your comments. I request guidance on #1
and #4.
{quote}
1. isSecureOnClient may also want to use the peer's address to
make a decision. e.g. intra-cluster transfer vs. distcp to remote cluster.
{quote}

The ip address of namenode or datanode is not available at some of the client
invocations. Please let me know if there is a way to get an ip address..

{quote}
2. Related to #1, isSecureOnClient and isSecureOnServer look awkward. How
about replacing both with isTrustedChannel that takes the peer's IP address? We
should probably avoid overloading the term secure in this context since there
is a related concept ofPeer#hasSecureChannel().
{quote}

I have renamed the class to TrustedChannelResolver and function to isTrusted()
.

{quote}
3. Could you please update the documentation
{quote}

done

{quote}
4. Is the InetAddress.getByName call in DataXceiver#getClientAddress necessary?
If it were necessary it would have been a security hole since DNS resolution
may yield a different IP address than the one used by the client. It turns out
for the kinds of Peers we are interested in this will be an IP address, so
let's just remove the call.
{quote}

I wanted to use InetAddress as the argument to TrustedChannelResolver than a
string-ip-address to maintain parity with _SaslPropertiesResolver_. To convert
a string ip, I use InetAddress.getByName

From the documentation of InetAddress.getByName(String host):
The host name can either be a machine name, such as java.sun.com, or a
textual representation of its IP address. If a literal IP address is supplied,
only the validity of the address format is checked.
So basically , if the argument is ip address, getByName doesn't do a DNS check.
If there is a different way to get the InetAddress , we can definitely use
that.

Other option is to not care about the parity with _SaslPropertiesResolver_ and
pass the string ip address.
Yet another option will be to pass the Peer itself to TrustedChannelResolver
so that the custom implementation can take care of getting the ip address etc.
Will be great to get your opinion on this.

Enhance DataTransferProtocol to allow per-connection choice of
encryption/plain-text

Key: HDFS-5910
URL: https://issues.apache.org/jira/browse/HDFS-5910
Project: Hadoop HDFS
Issue Type: Improvement
Components: security
Affects Versions: 2.2.0
Reporter: Benoy Antony
Assignee: Benoy Antony
Attachments: HDFS-5910.patch, HDFS-5910.patch, HDFS-5910.patch

It is possible to enable encryption of DataTransferProtocol.
In some use cases, it is required to encrypt data transfer with some clients
, but communicate in plain text with some other clients and data nodes.
A sample use case will be that any data transfer inside a firewall can be in
plain text whereas any data transfer from clients outside the firewall needs
to be encrypted.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


 [ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5840:


Fix Version/s: (was: 3.0.0)
 Target Version/s: 2.4.0  (was: 3.0.0)
Affects Version/s: 2.4.0
   Status: In Progress  (was: Patch Available)

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Blocker
 Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


 [ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-5840:
---

Assignee: Jing Zhao  (was: Aaron T. Myers)

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Aaron T. Myers
Assignee: Jing Zhao
Priority: Blocker
 Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


 [ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5840:


Status: Patch Available  (was: In Progress)

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Aaron T. Myers
Assignee: Jing Zhao
Priority: Blocker
 Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


 [ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5840:


Component/s: journal-node
 ha

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, journal-node, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Aaron T. Myers
Assignee: Jing Zhao
Priority: Blocker
 Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6124) Add final modifier to class members

2014-03-24 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6124:


  Resolution: Fixed
   Fix Version/s: 2.4.0
  3.0.0
Target Version/s: 2.4.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I committed the patch to trunk, branch-2 and branch-2.4.

Thanks for the code cleanup Suresh!

 Add final modifier to class members
 ---

 Key: HDFS-6124
 URL: https://issues.apache.org/jira/browse/HDFS-6124
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-6124.1.patch, HDFS-6124.patch


 Many of the member variables declaration for classes are missing final 
 modifier in HDFS. This jira adds final modifier where possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures


[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945899#comment-13945899
 ] 

Hadoop QA commented on HDFS-5840:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636424/HDFS-5840.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestSafeMode
  org.apache.hadoop.hdfs.qjournal.TestNNWithQJM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6479//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6479//console

This message is automatically generated.

 Follow-up to HDFS-5138 to improve error handling during partial upgrade 
 failures
 

 Key: HDFS-5840
 URL: https://issues.apache.org/jira/browse/HDFS-5840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, journal-node, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Aaron T. Myers
Assignee: Jing Zhao
Priority: Blocker
 Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch


 Suresh posted some good comment in HDFS-5138 after that patch had already 
 been committed to trunk. This JIRA is to address those. See the first comment 
 of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6124) Add final modifier to class members


[ 
https://issues.apache.org/jira/browse/HDFS-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945912#comment-13945912
 ] 

Hudson commented on HDFS-6124:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5395 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5395/])
HDFS-6124. Add final modifier to class members. (Contributed by Suresh 
Srinivas) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581124)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockMissingException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/CorruptFileBlockIterator.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSHedgedReadMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DomainSocketFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/StorageType.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DomainPeerServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockLocalPathInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CorruptFileBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeLocalInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/RollingUpgradeStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PacketReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/RequestInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java
*

[jira] [Commented] (HDFS-5910) Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text

2014-03-24 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945925#comment-13945925
 ] 

Arpit Agarwal commented on HDFS-5910:
-

Thanks for making the changes Benoy.

{quote}
The ip address of namenode or datanode is not available at some of the client 
invocations. Please let me know if there is a way to get an ip address..
{quote}
Just for my understanding - lacking the peer's IP address is it your intention 
to use configuration to decide the client's behavior?

I looked through the usages of {{isTrusted}} and some of them already have the 
connected socket available, so it is fairly easy to query the remote's socket 
address and pass it to {{isTrusted}}.

For the usage in getDataEncryptionKey(), we can refactor to pass a functor as 
the encryption key to e.g. {{getFileChecksum}}. However I am okay with doing 
the refactoring in a separate change. We can leave the parameter-less overload 
of {{isTrusted}} for now and just use it from {{getEcnryptionKey}} and file a 
separate Jira to fix it.

{quote}
 I wanted to use InetAddress as the argument to TrustedChannelResolver than a 
string-ip-address to maintain parity with SaslPropertiesResolver. To convert a 
string ip, I use InetAddress.getByName
{quote}
Thanks for the explanation. Will 
[InetAddresses#forString|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/net/InetAddresses.html#forString%28java.lang.String%29]
 from Guava work for you? I just checked and it's available in our build.

 Enhance DataTransferProtocol to allow per-connection choice of 
 encryption/plain-text
 

 Key: HDFS-5910
 URL: https://issues.apache.org/jira/browse/HDFS-5910
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Affects Versions: 2.2.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-5910.patch, HDFS-5910.patch, HDFS-5910.patch


 It is possible to enable encryption of DataTransferProtocol. 
 In some use cases, it is required to encrypt data transfer with some clients 
 , but communicate in plain text with some other clients and data nodes.
 A sample use case will be that any data transfer inside a firewall can be in 
 plain text whereas any data transfer from clients  outside the firewall needs 
 to be encrypted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6150) Add inode id information in the logs to make debugging easier


[ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945938#comment-13945938
 ] 

Suresh Srinivas commented on HDFS-6150:
---

New patch fixes the test failures and address the comments.

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.1.patch, HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier


 [ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-6150:
--

Attachment: HDFS-6150.1.patch

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.1.patch, HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier


 [ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-6150:
--

Attachment: HDFS-6150.2.patch

New patch to fix NPE.

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
 Attachments: HDFS-6150.1.patch, HDFS-6150.2.patch, HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945943#comment-13945943
 ] 

Fengdong Yu commented on HDFS-6130:
---

Thanks [~szetszwo]!

[~wheat9], do you want only fsimage or both image and edit log? I'll reproduce 
today using 1.3.0 and the latest trunk, then I'll keep the corresponding 
fsimage and edit logs.

 NPE during namenode upgrade from old release
 

 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
 I can upgrade successfully if I don't configurage HA, but if HA enabled,
 there is NPE when I run ' hdfs namenode -initializeSharedEdits'
 {code}
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
 enabled
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
 total heap and retry cache entry expiry time is 60 millis
 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
 NameNodeRetryCache
 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
 275.3 KB
 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
 14/03/20 15:06:41 INFO common.Storage: Lock on 
 /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO common.Storage: Lock on 
 /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier


 [ 
https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6150:
--

Priority: Minor  (was: Major)
Assignee: Suresh Srinivas
Hadoop Flags: Reviewed

+1  HDFS-6150.2.patch looks good.

 Add inode id information in the logs to make debugging easier
 -

 Key: HDFS-6150
 URL: https://issues.apache.org/jira/browse/HDFS-6150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Priority: Minor
 Attachments: HDFS-6150.1.patch, HDFS-6150.2.patch, HDFS-6150.patch


 Inode information and path information are missing in the logs and 
 exceptions. Adding this will help debug multi threading issues related to 
 using incode INode ID information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945988#comment-13945988
 ] 

Haohui Mai commented on HDFS-6130:
--

Can you create a checkpoint so that upgrading from the checkpointed fsimage 
will triggered the bug?

 NPE during namenode upgrade from old release
 

 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
 I can upgrade successfully if I don't configurage HA, but if HA enabled,
 there is NPE when I run ' hdfs namenode -initializeSharedEdits'
 {code}
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
 enabled
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
 total heap and retry cache entry expiry time is 60 millis
 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
 NameNodeRetryCache
 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
 275.3 KB
 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
 14/03/20 15:06:41 INFO common.Storage: Lock on 
 /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO common.Storage: Lock on 
 /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release

2014-03-24 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945992#comment-13945992
 ] 

Fengdong Yu commented on HDFS-6130:
---

OK, no problem, I can using  rollingUpgrade -prepare to create check point.

 NPE during namenode upgrade from old release
 

 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 
 I can upgrade successfully if I don't configurage HA, but if HA enabled,
 there is NPE when I run ' hdfs namenode -initializeSharedEdits'
 {code}
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is 
 enabled
 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
 total heap and retry cache entry expiry time is 60 millis
 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map 
 NameNodeRetryCache
 14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
 275.3 KB
 14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
 14/03/20 15:06:41 INFO common.Storage: Lock on 
 /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO common.Storage: Lock on 
 /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 
 7326@10-150-170-176
 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified

[
https://issues.apache.org/jira/browse/HDFS-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yongjun Zhang updated HDFS-6152:

Description:
Two issues were observed with distcpV2

ISSUE 1. when copying a source dir to target dir with -pu option using
command

distcp -pu source-dir target-dir

The source dir's owner is not preserved at target dir. Simiarly other
attributes of source dir are not preserved. Supposedly they should be
preserved when no -update and no -overwrite specified.

There are two scenarios with the above command:

a. when target-dir already exists. Issuing the above command will result in
target-dir/source-dir (source-dir here refers to the last component of the
source-dir path in the command line) at target file system, with all contents
in source-dir copied to under target-dir/src-dir. The issue in this case is,
the attributes of src-dir is not preserved.

b. when target-dir doesn't exist. It will result in target-dir with all
contents of source-dir copied to under target-dir. This issue in this case is,
the attributes of source-dir is not carried over to target-dir.

For multiple source cases, e.g., command

distcp -pu source-dir1 source-dir2 target-dir

No matter whether the target-dir exists or not, the multiple sources are copied
to under the target dir (target-dir is created if it didn't exist). And their
attributes are preserved.

ISSUE 2. with the following command:

distcp source-dir target-dir

when source-dir is an empty directory, and when target-dir doesn't exist,
source-dir is not copied, actually the command behaves like a no-op. However,
when the source-dir is not empty, it would be copied and results in target-dir
at the target file system containing a copy of source-dir's children.

To be consistent, empty source dir should be copied too. Basically the above
distcp command should cause target-dir get created at target file system, and
the source-dir's attributes are preserved at target-dir when -p is passed.

was:
Two issues were observed with distcpV2

ISSUE 1. when copying a source dir to target dir with -pu option using
command

distcp -pu source-dir target-dir

The source dir's owner is not preserved at target dir. Simiarly other
attributes of source dir are not preserved. Supposedly they should be
preserved when no -update and no -overwrite specified.

There are two scenarios with the above command:

For multiple source cases, e.g., command

distcp -pu source-dir1 source-dir2 target-dir

No matter whether the target-dir exists or not, the multiple sources are copied
to under the target dir (target-dir is created if it didn't exist). And their
attributes are preserved.

ISSUE 2. with the following command:

distcp source-dir target-dir

when source-dir is an empty directory, and when target-dir doesn't exist,
source-dir is not copied, actually the command behaves like a no-op.

However, when the source-dir is not empty, it would be copied and results in
target-dir at the target file system containing a copy of source-dir's children.

To be consistent, empty source dir should be copied too. Basically the above
distcp command should cause target-dir get created at target file
system, and the source-dir's attributes are preserved at target-dir when
-p is passed.

distcp V2 doesn't preserve root dir's attributes when -p is specified
-

Two issues were observed with distcpV2
ISSUE 1. when copying a source dir to target dir with -pu option using
command
distcp -pu source-dir target-dir

[jira] [Commented] (HDFS-6125) Cleanup unnecessary cast in HDFS code base