date:20130814

[
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739288#comment-13739288
]

Vinay commented on HDFS-4504:
-

bq. It seems to me like it would be better to call completeFile() or perhaps
some new abortFile() RPC, which would first verify that the client name trying
to abort the lease is the same as the current lease holder.
This looks good. Seems this would take lot of code changes and also lot of
cases to handle. But may be difficult to handle suppose you have two threads,
T1 and T2. They both have a client name of C. case since client is same.

DFSOutputStream#close doesn't always release resources (such as leases)
---

Key: HDFS-4504
URL: https://issues.apache.org/jira/browse/HDFS-4504
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch,
HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch,
HDFS-4504.010.patch, HDFS-4504.011.patch

{{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One
example is if there is a pipeline error and then pipeline recovery fails.
Unfortunately, in this case, some of the resources used by the
{{DFSOutputStream}} are leaked. One particularly important resource is file
leases.
So it's possible for a long-lived HDFS client, such as Flume, to write many
blocks to a file, but then fail to close it. Unfortunately, the
{{LeaseRenewerThread}} inside the client will continue to renew the lease for
the undead file. Future attempts to close the file will just rethrow the
previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

[
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739293#comment-13739293
]

Colin Patrick McCabe commented on HDFS-4504:

I don't think adding a new RPC would be too bad. It would be very similar to
recoverLease.

bq. But may be difficult to handle suppose you have two threads, T1 and T2.
They both have a client name of C. case since client is same.

I think we should do this in HDFS-4688 rather than trying to solve it here.

DFSOutputStream#close doesn't always release resources (such as leases)
---

[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present


 [ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-3618:


Attachment: HDFS-3618.patch

Updated test

 SSH fencing option may incorrectly succeed if nc (netcat) command not present
 -

 Key: HDFS-3618
 URL: https://issues.apache.org/jira/browse/HDFS-3618
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover
Affects Versions: 2.0.0-alpha
Reporter: Brahma Reddy Battula
Assignee: Vinay
 Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, 
 zkfc_threaddump.out, zkfc.txt


 Started NN's and zkfc's in Suse11.
 Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
 work)..
 While executing following command, got command not found hence rc will be 
 other than zero and assuming that server was down..Here we are ending up 
 without checking whether service is down or not..
 {code}
 LOG.info(
 Indeterminate response from trying to kill service.  +
 Verifying whether it is running using nc...);
 rc = execCommand(session, nc -z  + serviceAddr.getHostName() +
   + serviceAddr.getPort());
 if (rc == 0) {
   // the service is still listening - we are unable to fence
   LOG.warn(Unable to fence - it is running but we cannot kill it);
   return false;
 } else {
   LOG.info(Verified that the service is down.);
   return true;  
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present


[ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739326#comment-13739326
 ] 

Vinay commented on HDFS-3618:
-

Findbug and javadoc warnings are unrelated. I am seeing these on every patch 
submitted. may be some problem with QA?

 SSH fencing option may incorrectly succeed if nc (netcat) command not present
 -

 Key: HDFS-3618
 URL: https://issues.apache.org/jira/browse/HDFS-3618
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover
Affects Versions: 2.0.0-alpha
Reporter: Brahma Reddy Battula
Assignee: Vinay
 Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, 
 zkfc_threaddump.out, zkfc.txt


 Started NN's and zkfc's in Suse11.
 Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
 work)..
 While executing following command, got command not found hence rc will be 
 other than zero and assuming that server was down..Here we are ending up 
 without checking whether service is down or not..
 {code}
 LOG.info(
 Indeterminate response from trying to kill service.  +
 Verifying whether it is running using nc...);
 rc = execCommand(session, nc -z  + serviceAddr.getHostName() +
   + serviceAddr.getPort());
 if (rc == 0) {
   // the service is still listening - we are unable to fence
   LOG.warn(Unable to fence - it is running but we cannot kill it);
   return false;
 } else {
   LOG.info(Verified that the service is down.);
   return true;  
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

[
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739343#comment-13739343
]

Colin Patrick McCabe commented on HDFS-5051:

The random jitter code was taken from the block report code. The goal is the
same-- to avoid overloading the NameNode with too many reports at the same
time. I don't see any reason to take out the jitter code here, although it
will not be as important as it was in the block report case.

As far as I can tell, genstamp and block length should not be included in the
cache report. They aren't included in the regular block report in
StorageBlockReportProto. When asking a DataNode to lock a block, the NameNode
can specify the genstamp and minimum length it wants at that time, and the
DataNode can fail the request if it doesn't have that genstamp / length. This
issue starts getting into the NN to DN communicaion (HDFS-5053). That's why I
suggested discussing it there-- although I'm happy to discuss it here as well.

Propagate cache status information from the DataNode to the NameNode

Key: HDFS-5051
URL: https://issues.apache.org/jira/browse/HDFS-5051
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch

The DataNode needs to inform the NameNode of its current cache state. Let's
wire up the RPCs and stub out the relevant methods on the DN and NN side.

[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize

[
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739356#comment-13739356
]

Vinay commented on HDFS-2882:
-

bq. Did you reproduce the problem? If so, what were the steps to reproduce?
Please check the test. I had just reproduced cases mentioned by Todd.

bq. Also, your patch seems to make the DataNode loop endlessly trying to
initialize any block pools that don't come up. I don't think that's what we
want to do here.
No. In case of multiple namenodes nameservice, if any one of the namenode is
able to connect and BPOS is initialized, then only retry will be infinite for
the other namenode. Retry to initialize BPOS will continue until both Namenodes
failed to initialize else BPOS will exit.

One more thing {{BPServiceActor#retrieveNamespaceInfo()}} is in inifinite loop,
yes this can cause initialize to goto infinite loop, if namenode was down/not
responding. But this is not changed in my patch.

DN continues to start up, even if block pool fails to initialize

Key: HDFS-2882
URL: https://issues.apache.org/jira/browse/HDFS-2882
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
Attachments: HDFS-2882.patch, hdfs-2882.txt

I started a DN on a machine that was completely out of space on one of its
drives. I saw the following:
2012-02-02 09:56:50,499 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id
DS-507718931-172.29.5.194-11072-12978
42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
java.io.IOException: Mkdirs failed to create
/data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
at
org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335)
but the DN continued to run, spewing NPEs when it tried to do block reports,
etc. This was on the HDFS-1623 branch but may affect trunk as well.

[jira] [Created] (HDFS-5094) Add Metrics in DFSClient

LiuLei created HDFS-5094:


 Summary: Add Metrics in DFSClient
 Key: HDFS-5094
 URL: https://issues.apache.org/jira/browse/HDFS-5094
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: LiuLei


Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS 
performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5094) Add Metrics in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5094:
-

Attachment: DFSCLientMetrics.patch

 Add Metrics in DFSClient
 

 Key: HDFS-5094
 URL: https://issues.apache.org/jira/browse/HDFS-5094
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: DFSCLientMetrics.patch


 Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS 
 performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


[ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739367#comment-13739367
 ] 

Tao Luo commented on HDFS-5079:
---

Replacing NNHAStatusHeartbeat.State with HAServiceState

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


 [ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-5079:
--

Attachment: HDFS-5079.patch

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


 [ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo reassigned HDFS-5079:
-

Assignee: Tao Luo

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record

LiuLei created HDFS-5095:


 Summary: Using JournalNode IP as name of IPCLoggerChannel metrics 
record
 Key: HDFS-5095
 URL: https://issues.apache.org/jira/browse/HDFS-5095
 Project: Hadoop HDFS
  Issue Type: Task
  Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei


I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics 
record name, so metrics record of all JournalNode are displayed together in 
ganglia.  If every JournalNode display metrics record in difference name that 
is better.  I think use JournalNode IP as the name is better.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5094) Add Metrics in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5094:
-

Attachment: IPCLoggerChannelMetrics.java.patch

 Add Metrics in DFSClient
 

 Key: HDFS-5094
 URL: https://issues.apache.org/jira/browse/HDFS-5094
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: DFSCLientMetrics.patch, 
 IPCLoggerChannelMetrics.java.patch


 Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS 
 performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5094) Add Metrics in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5094:
-

Attachment: (was: IPCLoggerChannelMetrics.java.patch)

 Add Metrics in DFSClient
 

 Key: HDFS-5094
 URL: https://issues.apache.org/jira/browse/HDFS-5094
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: DFSCLientMetrics.patch


 Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS 
 performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record


 [ 
https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5095:
-

Attachment: IPCLoggerChannelMetrics.java.patch

 Using JournalNode IP as name of IPCLoggerChannel metrics record
 ---

 Key: HDFS-5095
 URL: https://issues.apache.org/jira/browse/HDFS-5095
 Project: Hadoop HDFS
  Issue Type: Task
  Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: IPCLoggerChannelMetrics.java.patch


 I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics 
 record name, so metrics record of all JournalNode are displayed together in 
 ganglia.  If every JournalNode display metrics record in difference name that 
 is better.  I think use JournalNode IP as the name is better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record


 [ 
https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5095:
-

Attachment: metrics.jpg

Diagram of one JournalNode in ganglia

 Using JournalNode IP as name of IPCLoggerChannel metrics record
 ---

 Key: HDFS-5095
 URL: https://issues.apache.org/jira/browse/HDFS-5095
 Project: Hadoop HDFS
  Issue Type: Task
  Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg


 I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics 
 record name, so metrics record of all JournalNode are displayed together in 
 ganglia.  If every JournalNode display metrics record in difference name that 
 is better.  I think use JournalNode IP as the name is better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record


 [ 
https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5095:
-

Attachment: metrics.jpg

 Using JournalNode IP as name of IPCLoggerChannel metrics record
 ---

 Key: HDFS-5095
 URL: https://issues.apache.org/jira/browse/HDFS-5095
 Project: Hadoop HDFS
  Issue Type: Task
  Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg


 I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics 
 record name, so metrics record of all JournalNode are displayed together in 
 ganglia.  If every JournalNode display metrics record in difference name that 
 is better.  I think use JournalNode IP as the name is better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record


 [ 
https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiuLei updated HDFS-5095:
-

Attachment: (was: metrics.jpg)

 Using JournalNode IP as name of IPCLoggerChannel metrics record
 ---

 Key: HDFS-5095
 URL: https://issues.apache.org/jira/browse/HDFS-5095
 Project: Hadoop HDFS
  Issue Type: Task
  Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg


 I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics 
 record name, so metrics record of all JournalNode are displayed together in 
 ganglia.  If every JournalNode display metrics record in difference name that 
 is better.  I think use JournalNode IP as the name is better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present


[ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739442#comment-13739442
 ] 

Hadoop QA commented on HDFS-3618:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597900/HDFS-3618.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4820//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4820//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4820//console

This message is automatically generated.

 SSH fencing option may incorrectly succeed if nc (netcat) command not present
 -

 Key: HDFS-3618
 URL: https://issues.apache.org/jira/browse/HDFS-3618
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover
Affects Versions: 2.0.0-alpha
Reporter: Brahma Reddy Battula
Assignee: Vinay
 Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, 
 zkfc_threaddump.out, zkfc.txt


 Started NN's and zkfc's in Suse11.
 Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
 work)..
 While executing following command, got command not found hence rc will be 
 other than zero and assuming that server was down..Here we are ending up 
 without checking whether service is down or not..
 {code}
 LOG.info(
 Indeterminate response from trying to kill service.  +
 Verifying whether it is running using nc...);
 rc = execCommand(session, nc -z  + serviceAddr.getHostName() +
   + serviceAddr.getPort());
 if (rc == 0) {
   // the service is still listening - we are unable to fence
   LOG.warn(Unable to fence - it is running but we cannot kill it);
   return false;
 } else {
   LOG.info(Verified that the service is down.);
   return true;  
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-2933) Datanode index page on debug port not useful

2013-08-14 Thread Vivek Ganesan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Ganesan reassigned HDFS-2933:
---

Assignee: Vivek Ganesan

 Datanode index page on debug port not useful
 

 Key: HDFS-2933
 URL: https://issues.apache.org/jira/browse/HDFS-2933
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Philip Zeyliger
Assignee: Vivek Ganesan
  Labels: newbie

 If you visit the root page of a datanode's web port, you get an index page 
 with WEB-INF and robots.txt.  More useful would be to include information 
 about the datanode, like its version, and links to /browseDirectory, /jmx, 
 /metrics, /conf, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA


[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739519#comment-13739519
 ] 

Hudson commented on HDFS-5091:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #301 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/301/])
HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for 
secure HA. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513700)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java


 Support for spnego keytab separate from the JournalNode keytab for secure HA
 

 Key: HDFS-5091
 URL: https://issues.apache.org/jira/browse/HDFS-5091
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: HDFS-5091.001.patch


 This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
 use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA


[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739654#comment-13739654
 ] 

Hudson commented on HDFS-5091:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1491 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1491/])
HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for 
secure HA. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513700)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java


 Support for spnego keytab separate from the JournalNode keytab for secure HA
 

 Key: HDFS-5091
 URL: https://issues.apache.org/jira/browse/HDFS-5091
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: HDFS-5091.001.patch


 This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
 use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA


[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739709#comment-13739709
 ] 

Hudson commented on HDFS-5091:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1518 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1518/])
HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for 
secure HA. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513700)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java


 Support for spnego keytab separate from the JournalNode keytab for secure HA
 

 Key: HDFS-5091
 URL: https://issues.apache.org/jira/browse/HDFS-5091
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.1.1-beta

 Attachments: HDFS-5091.001.patch


 This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
 use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

[
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739707#comment-13739707
]

Suresh Srinivas commented on HDFS-5051:
---

bq. The random jitter code was taken from the block report code. The goal is
the same-- to avoid overloading the NameNode with too many reports at the same
time. I don't see any reason to take out the jitter code here, although it will
not be as important as it was in the block report case.

Quoting my own question:
bq. When a datanode starts, do we expect any thing to be in the cache at all?
Hence the question why is jitter code is important?

bq. They aren't included in the regular block report in StorageBlockReportProto.
That is not correct. Please see the code in BlockListAsLongs.

We need to decide the following (and I do not think with the current summary in
HDFS-5053, that is the right place):
# Do we need to include generation stamp and length? My early thought is, it
may not be necessary. Current code includes both generation stamp and length.
# When there are no cache entries in the datanode, my preference is not to send
a cache report at all, including the first time datanode starts up. I agree
that we could have incremental cache report.

Propagate cache status information from the DataNode to the NameNode

The DataNode needs to inform the NameNode of its current cache state. Let's
wire up the RPCs and stub out the relevant methods on the DN and NN side.

[jira] [Comment Edited] (HDFS-3755) Creating an already-open-for-write file with overwrite=true fails


[ 
https://issues.apache.org/jira/browse/HDFS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739199#comment-13739199
 ] 

Suresh Srinivas edited comment on HDFS-3755 at 8/14/13 2:33 PM:


Given a regression from branch-1 was fixed in this Jira, why is it incompatible?


  was (Author: sureshms):
Given a regression from branch-1 was fixed in this Jira, why is it 
incompatible?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

  
 Creating an already-open-for-write file with overwrite=true fails
 -

 Key: HDFS-3755
 URL: https://issues.apache.org/jira/browse/HDFS-3755
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.0.2-alpha

 Attachments: hdfs-3755.txt, hdfs-3755.txt


 If a file is already open for write by one client, and another client calls 
 {{fs.create()}} with {{overwrite=true}}, the file should be deleted and the 
 new file successfully created. Instead, it is currently throwing 
 AlreadyBeingCreatedException.
 This is a regression since branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-08-14 Thread John George (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739734#comment-13739734
 ] 

John George commented on HDFS-2832:
---

 To support Storage Types, each DataNode must be treated as a collection of 
storages. (excerpt from pdf)
Consider a cluster with a set of DataNodes with high end hardware (eg: SSD), 
and another set of DataNodes with low end hardware (eg: HDD). Each datanode is 
homogenous by itself, but the cluster itself is heterogeneous. Can the user 
still specify storage preference using StorageType and get expected results? 

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize


[ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739738#comment-13739738
 ] 

Vinay commented on HDFS-2882:
-

Hi [~tlipcon], could you take a look at the patch, as the patch is on top of 
your work. Thanks

 DN continues to start up, even if block pool fails to initialize
 

 Key: HDFS-2882
 URL: https://issues.apache.org/jira/browse/HDFS-2882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
 Attachments: HDFS-2882.patch, hdfs-2882.txt


 I started a DN on a machine that was completely out of space on one of its 
 drives. I saw the following:
 2012-02-02 09:56:50,499 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
 DS-507718931-172.29.5.194-11072-12978
 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
 java.io.IOException: Mkdirs failed to create 
 /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
 at 
 org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335)
 but the DN continued to run, spewing NPEs when it tried to do block reports, 
 etc. This was on the HDFS-1623 branch but may affect trunk as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record


[ 
https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739744#comment-13739744
 ] 

Suresh Srinivas commented on HDFS-5095:
---

Before this change the metrics name is NameNode as you noted. With the patch 
the metrics name is IPCLoggerChannel-address-port and is collected at the 
namenode. Instead of this, we could use a name such 
NameNode-qjournal-Address-port? Also please add a unit test.

 Using JournalNode IP as name of IPCLoggerChannel metrics record
 ---

 Key: HDFS-5095
 URL: https://issues.apache.org/jira/browse/HDFS-5095
 Project: Hadoop HDFS
  Issue Type: Task
  Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
 Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg


 I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics 
 record name, so metrics record of all JournalNode are displayed together in 
 ganglia.  If every JournalNode display metrics record in difference name that 
 is better.  I think use JournalNode IP as the name is better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record

[
https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739744#comment-13739744
]

Suresh Srinivas edited comment on HDFS-5095 at 8/14/13 3:00 PM:

Before this change the metrics name is NameNode as you noted. With the patch
the metrics name is {noformat}IPCLoggerChannel-address-port{noformat} and
is collected at the namenode. Instead of this, we could use a name that both
indicates that this metrics is from namenode and is related to quorum journal
with a name like {noformat}NameNode-qjournal-Address-port{noformat}

Also please add a unit test.

was (Author: sureshms):
Before this change the metrics name is NameNode as you noted. With the
patch the metrics name is IPCLoggerChannel-address-port and is collected at
the namenode. Instead of this, we could use a name such
NameNode-qjournal-Address-port? Also please add a unit test.

Using JournalNode IP as name of IPCLoggerChannel metrics record
---

Key: HDFS-5095
URL: https://issues.apache.org/jira/browse/HDFS-5095
Project: Hadoop HDFS
Issue Type: Task
Components: qjm
Affects Versions: 2.0.5-alpha
Reporter: LiuLei
Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg

I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics
record name, so metrics record of all JournalNode are displayed together in
ganglia. If every JournalNode display metrics record in difference name that
is better. I think use JournalNode IP as the name is better.

[jira] [Commented] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID


[ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739767#comment-13739767
 ] 

Suresh Srinivas commented on HDFS-5076:
---

Jing, some comments:
# In journal status should we also return the address and port of the Journal 
node?
# Javadoc says A string presenting status for each journal. Do we want 
another method which takes a journal ID/namespaceID for journal related to a 
specific namenode.

 Create http servlets to enable querying NN's last applied transaction ID and 
 most recent checkpoint's transaction ID
 

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide servlets to enable querying these information through http, so 
 that administrators and applications like Ambari can easily decide if a 
 forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4946) Allow preferLocalNode in BlockPlacementPolicyDefault to be configurable

2013-08-14 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739790#comment-13739790
 ] 

Harsh J commented on HDFS-4946:
---

bq. Allow preferLocalNode in BlockPlacementPolicyDefault to be disabled in 
configuration to prevent *a client* from writing the first replica of every 
block (i.e. the entire file) to the local DataNode.

The description reads to prevent specific clients, but the config toggle would 
shut off all clients from writing locally, which may not be desirous. Ideally 
we would like a client sent hint that influences the selection.

 Allow preferLocalNode in BlockPlacementPolicyDefault to be configurable
 ---

 Key: HDFS-4946
 URL: https://issues.apache.org/jira/browse/HDFS-4946
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: James Kinley
Assignee: James Kinley
 Attachments: HDFS-4946-1.patch


 Allow preferLocalNode in BlockPlacementPolicyDefault to be disabled in 
 configuration to prevent a client from writing the first replica of every 
 block (i.e. the entire file) to the local DataNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


 [ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-5079:
--

Status: Patch Available  (was: Open)

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.

2013-08-14 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5079:
-

Hadoop Flags: Incompatible change

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739884#comment-13739884
 ] 

Arpit Agarwal commented on HDFS-2832:
-

John,
{quote}Can the user still specify storage preference using StorageType and get 
expected results? {quote}
We don't make any assumptions about the cluster layout. The storages attached 
to a DataNode may be of the same of different types.

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services

2013-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739894#comment-13739894
 ] 

Allen Wittenauer commented on HDFS-5087:


Is there a reason this is preferred over just modifying hadoop-env.sh's service 
specific env opts?

 Allowing specific JAVA heap max setting for HDFS related services
 -

 Key: HDFS-5087
 URL: https://issues.apache.org/jira/browse/HDFS-5087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Reporter: Kai Zheng
Priority: Minor
 Attachments: HDFS-5087.patch


 This allows specific JAVA heap max setting for HDFS related services as it 
 does for YARN services, to be consistent. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


[ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739961#comment-13739961
 ] 

Hadoop QA commented on HDFS-5079:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597906/HDFS-5079.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4821//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4821//console

This message is automatically generated.

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack

2013-08-14 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740054#comment-13740054
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4898:
--

The failure of TestBlocksWithNotEnoughRacks is not related.  It does not use 
BlockPlacementPolicyWithNodeGroup at all.

 BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly 
 fallback to local rack
 -

 Key: HDFS-4898
 URL: https://issues.apache.org/jira/browse/HDFS-4898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.0.4-alpha
Reporter: Eric Sirianni
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4898_20130809.patch


 As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not 
 properly fallback to local rack when no nodes are available in remote racks, 
 resulting in an improper {{NotEnoughReplicasException}}.
 {code:title=BlockPlacementPolicyWithNodeGroup.java}
   @Override
   protected void chooseRemoteRack(int numOfReplicas,
   DatanodeDescriptor localMachine, HashMapNode, Node excludedNodes,
   long blocksize, int maxReplicasPerRack, ListDatanodeDescriptor 
 results,
   boolean avoidStaleNodes) throws NotEnoughReplicasException {
 int oldNumOfReplicas = results.size();
 // randomly choose one node from remote racks
 try {
   chooseRandom(
   numOfReplicas,
   ~ + 
 NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()),
   excludedNodes, blocksize, maxReplicasPerRack, results,
   avoidStaleNodes);
 } catch (NotEnoughReplicasException e) {
   chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas),
   localMachine.getNetworkLocation(), excludedNodes, blocksize,
   maxReplicasPerRack, results, avoidStaleNodes);
 }
   }
 {code}
 As currently coded the {{chooseRandom()}} call in the {{catch}} block will 
 never succeed as the set of nodes within the passed in node path (e.g. 
 {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes 
 (both are the set of nodes within the same nodegroup as the node chosen first 
 replica).
 The bug is that the fallback {{chooseRandom()}} call in the catch block 
 should be passing in the _complement_ of the node path used in the initial 
 {{chooseRandom()}} call in the try block (e.g. {{/rack1}})  - namely:
 {code}
 NetworkTopology.getFirstHalf(localMachine.getNetworkLocation())
 {code}
 This will yield the proper fallback behavior of choosing a random node from 
 _within the same rack_, but still excluding those nodes _in the same 
 nodegroup_

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services

2013-08-14 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740103#comment-13740103
 ] 

Kai Zheng commented on HDFS-5087:
-

In hdfs script, with the following line
{code}
exec $JAVA -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS 
{code}
If both JAVA_HEAP_MAX and service specific -Xmx (via relevant *_OPTS) are set, 
then which one will be used? JAVA_HEAP_MAX is always defined in 
hadoop-config.sh. Even JVM has clear definition about it, IMO it would be 
better to avoid it. The way used in the patch to resolve the conflict 
considered to be consistent with YARN related services.

 Allowing specific JAVA heap max setting for HDFS related services
 -

 Key: HDFS-5087
 URL: https://issues.apache.org/jira/browse/HDFS-5087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Reporter: Kai Zheng
Priority: Minor
 Attachments: HDFS-5087.patch


 This allows specific JAVA heap max setting for HDFS related services as it 
 does for YARN services, to be consistent. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740145#comment-13740145
 ] 

Andrew Wang commented on HDFS-4949:
---

Hi Arun,

On the read path comments, it might be elucidating to check out the zero-copy 
read API that Colin's working on at HDFS-4953. The idea is that clients always 
use the zero copy cursor to do reads, which behind the scenes will do an mmap'd 
read if the block is cached, or a normal copying read if the block is on disk 
or remote. It allows an {{isCached}}-type check via not setting a fallback 
buffer for copying reads. This will cause the cursor to throw an exception on 
read if the block is not cached. Finally, there's also a parameter for enabling 
short reads, which comes into play when a read spans block files.

On YARN integration, I'd like to revisit that a little ways down the road since 
we're focusing on getting a basic prototype out. If you want to get started on 
it now, it'd be helpful if you could review the current RM plan in the doc, and 
sketch out how a YARN-based architecture would look.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID


 [ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5076:


Attachment: HDFS-5076.004.patch

Update the patch to address Suresh's comments: add a new method which takes 
journal id as parameter and returns its status.

bq. In journal status should we also return the address and port of the Journal 
node?

Currently we put the MXBean to each JournalNode. Thus when querying the jmx the 
user should already have the knowledge about the corresponding JournalNode. So 
I think maybe here we do not need to return the address and port of the JN.

 Create http servlets to enable querying NN's last applied transaction ID and 
 most recent checkpoint's transaction ID
 

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch, HDFS-5076.004.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide servlets to enable querying these information through http, so 
 that administrators and applications like Ambari can easily decide if a 
 forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status


 [ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5076:


Description: Currently NameNode already provides RPC calls to get its last 
applied transaction ID and most recent checkpoint's transaction ID. It can be 
helpful to provide support to enable querying these information through JMX, so 
that administrators and applications like Ambari can easily decide if a forced 
checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean 
interface for JournalNodes to query the status of journals (e.g., whether 
journals are formatted or not).  (was: Currently NameNode already provides RPC 
calls to get its last applied transaction ID and most recent checkpoint's 
transaction ID. It can be helpful to provide servlets to enable querying these 
information through http, so that administrators and applications like Ambari 
can easily decide if a forced checkpoint by calling saveNamespace is necessary.)

 Add MXBean methods to query NN's transaction information and JournalNode's 
 journal status
 -

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch, HDFS-5076.004.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide support to enable querying these information through JMX, so that 
 administrators and applications like Ambari can easily decide if a forced 
 checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean 
 interface for JournalNodes to query the status of journals (e.g., whether 
 journals are formatted or not).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status


 [ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5076:


Summary: Add MXBean methods to query NN's transaction information and 
JournalNode's journal status  (was: Create http servlets to enable querying 
NN's last applied transaction ID and most recent checkpoint's transaction ID)

 Add MXBean methods to query NN's transaction information and JournalNode's 
 journal status
 -

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch, HDFS-5076.004.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide servlets to enable querying these information through http, so 
 that administrators and applications like Ambari can easily decide if a 
 forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

[
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740166#comment-13740166
]

Andrew Wang commented on HDFS-5051:
---

I included the gen stamp and length in the {{cacheReport}} to handle caching
newly appended data. I guess the gen stamp is unnecessary, but the DN isn't
going to automatically mlock newly appended data, so the NN needs to somehow
realize that the cached length is shorter than the new length and ask the DN to
recache at the new length. Alternatively, I guess the DN could automatically
mlock appended data, but there are quota implications there.

On startup, I agree that we can skip cache reports until the cache is
populated. I also agree that jittering doesn't matter as much if it's ticking
on such a short time scale. I guess I could have cleaned this up rather than
just changing the default cache report period like Colin asked.

However, since we want to eventually have both incremental and full reports,
let's just ape how block reports work; don't jitter the incremental reports,
but do jitter the start time for the full reports and afterwards tick at a
regular interval. Let's clean up all these issues in the incremental cache
report JIRA (HDFS-5092); if this sounds good, I'll edit the JIRA description
with these todo items.

Propagate cache status information from the DataNode to the NameNode

The DataNode needs to inform the NameNode of its current cache state. Let's
wire up the RPCs and stub out the relevant methods on the DN and NN side.

[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address

2013-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740172#comment-13740172
 ] 

Allen Wittenauer commented on HDFS-5055:


new patch appears to be working for me as well.

 nn-2nn ignores dfs.namenode.secondary.http-address
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Attachments: HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740174#comment-13740174
 ] 

Suresh Srinivas commented on HDFS-5051:
---

bq. I included the gen stamp and length in the cacheReport to handle caching 
newly appended data.
We need to specify what the cache behavior in this case is. My understanding 
was that for the first phase new data written will not be cached automatically. 
In fact any file that is being written to will not be cached until it is 
closed. Lets clearly define the behavior in these cases.

Rest sounds good. Thank you [~andrew.wang] for comprehensive look at the 
comments.


 Propagate cache status information from the DataNode to the NameNode
 

 Key: HDFS-5051
 URL: https://issues.apache.org/jira/browse/HDFS-5051
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch


 The DataNode needs to inform the NameNode of its current cache state. Let's 
 wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


 [ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4985:


Attachment: h4985.02.patch

Updated patch per design doc on 2832.

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch, HDFS-4985.001.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740200#comment-13740200
 ] 

Andrew Wang commented on HDFS-5051:
---

Gotcha, makes sense. I definitely only wanted to address caching finalized 
blocks at first, but I was thinking about the case where an 
append+write+close would lead to a finalized block with a new longer length. 
Let's punt that out to an auto-caching subtask (will file).

So, I'll remove the gen stamp and length in HDFS-5092; will edit it with this 
and the other todo items.

 Propagate cache status information from the DataNode to the NameNode
 

 Key: HDFS-5051
 URL: https://issues.apache.org/jira/browse/HDFS-5051
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch


 The DataNode needs to inform the NameNode of its current cache state. Let's 
 wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-5096) Automatically cache new data added to a cached path

Andrew Wang created HDFS-5096:
-

 Summary: Automatically cache new data added to a cached path
 Key: HDFS-5096
 URL: https://issues.apache.org/jira/browse/HDFS-5096
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Andrew Wang


For some applications, it's convenient to specify a path to cache, and have 
HDFS automatically cache new data added to the path without sending a new 
caching request or a manual refresh command.

One example is new data appended to a cached file. It would be nice to re-cache 
a block at the new appended length, and cache new blocks added to the file.

Another example is a cached Hive partition directory, where a user can drop new 
files directly into the partition. It would be nice if these new files were 
cached.

In both cases, this automatic caching would happen after the file is closed, 
i.e. block replica is finalized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3656) ZKFC may write a null breadcrumb znode

2013-08-14 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3656.
---

  Resolution: Duplicate
Target Version/s:   (was: )

Yep, I think you're right. Thanks.

 ZKFC may write a null breadcrumb znode
 

 Key: HDFS-3656
 URL: https://issues.apache.org/jira/browse/HDFS-3656
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon

 A user [reported|https://issues.cloudera.org/browse/DISTRO-412] an NPE trying 
 to read the breadcrumb znode in the failover controller. This happened 
 repeatedly, implying that an earlier process set the znode to null - probably 
 some race, though I don't see anything obvious in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5092) Add support for incremental cache reports

[
https://issues.apache.org/jira/browse/HDFS-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Wang updated HDFS-5092:
--

Description:
The initial {{cacheReport}} patch at HDFS-5051 does frequent full reports of DN
cache state. Better would be a scheme similar to how block reports are
currently done: send incremental cache reports on every heartbeat (seconds),
and full reports on a longer time scale (minutes to hours). This should reduce
network traffic and allow us to make incremental reports even faster.

As per discussion on HDFS-5051, we should also roll-up the following review
comments:

- Remove gen stamp and length from {{cacheReport}}, unnecessary until we do
auto-caching of appended data
- Only jitter full cache reports, similar to how full block reports are jittered
- On DN startup, skip all cache reports until the cache is populated. The NN
can just assume the DN cache is empty in the meantime.

was:We should send incremental cache reports as part of DN heartbeats,
similar to how we do incremental block reports. Then we would only need to
send full cache reports rarely (again similar to full block reports).

Assignee: Andrew Wang
Summary: Add support for incremental cache reports (was: piggyback
incremental cache reports on DN heartbeats)

Add support for incremental cache reports
-

Key: HDFS-5092
URL: https://issues.apache.org/jira/browse/HDFS-5092
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
Priority: Minor

The initial {{cacheReport}} patch at HDFS-5051 does frequent full reports of
DN cache state. Better would be a scheme similar to how block reports are
currently done: send incremental cache reports on every heartbeat (seconds),
and full reports on a longer time scale (minutes to hours). This should
reduce network traffic and allow us to make incremental reports even faster.
As per discussion on HDFS-5051, we should also roll-up the following review
comments:
- Remove gen stamp and length from {{cacheReport}}, unnecessary until we do
auto-caching of appended data
- Only jitter full cache reports, similar to how full block reports are
jittered
- On DN startup, skip all cache reports until the cache is populated. The NN
can just assume the DN cache is empty in the meantime.

[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail


[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740240#comment-13740240
 ] 

Konstantin Shvachko commented on HDFS-2994:
---

Liked the approach of the last patch, which updates the inode reference only 
when needed.
I'd recommend to reuse myFile variable, making it non final.
  myFile = INodeFile.valueOf(dir.getINode(src), src, true);
This should make it easier to port to other versions.
The comment is good, just don't use JavaDoc style. Regular // comment would 
do better.

 If lease is recovered successfully inline with create, create can fail
 --

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: amith
 Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
 HDFS-2994_3.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack


[ 
https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740247#comment-13740247
 ] 

Suresh Srinivas commented on HDFS-4898:
---

+1 for the patch. We should add a unit test for this.

 BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly 
 fallback to local rack
 -

 Key: HDFS-4898
 URL: https://issues.apache.org/jira/browse/HDFS-4898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.0.4-alpha
Reporter: Eric Sirianni
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4898_20130809.patch


 As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not 
 properly fallback to local rack when no nodes are available in remote racks, 
 resulting in an improper {{NotEnoughReplicasException}}.
 {code:title=BlockPlacementPolicyWithNodeGroup.java}
   @Override
   protected void chooseRemoteRack(int numOfReplicas,
   DatanodeDescriptor localMachine, HashMapNode, Node excludedNodes,
   long blocksize, int maxReplicasPerRack, ListDatanodeDescriptor 
 results,
   boolean avoidStaleNodes) throws NotEnoughReplicasException {
 int oldNumOfReplicas = results.size();
 // randomly choose one node from remote racks
 try {
   chooseRandom(
   numOfReplicas,
   ~ + 
 NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()),
   excludedNodes, blocksize, maxReplicasPerRack, results,
   avoidStaleNodes);
 } catch (NotEnoughReplicasException e) {
   chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas),
   localMachine.getNetworkLocation(), excludedNodes, blocksize,
   maxReplicasPerRack, results, avoidStaleNodes);
 }
   }
 {code}
 As currently coded the {{chooseRandom()}} call in the {{catch}} block will 
 never succeed as the set of nodes within the passed in node path (e.g. 
 {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes 
 (both are the set of nodes within the same nodegroup as the node chosen first 
 replica).
 The bug is that the fallback {{chooseRandom()}} call in the catch block 
 should be passing in the _complement_ of the node path used in the initial 
 {{chooseRandom()}} call in the try block (e.g. {{/rack1}})  - namely:
 {code}
 NetworkTopology.getFirstHalf(localMachine.getNetworkLocation())
 {code}
 This will yield the proper fallback behavior of choosing a random node from 
 _within the same rack_, but still excluding those nodes _in the same 
 nodegroup_

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


 [ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4985:


Attachment: (was: HDFS-4985.001.patch)

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail


[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740261#comment-13740261
 ] 

Konstantin Shvachko commented on HDFS-2994:
---

I checked your new test case. It works with patched code and fails with current 
implementation.
But I see tabs, could you please revert to spaces.

 If lease is recovered successfully inline with create, create can fail
 --

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: amith
 Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
 HDFS-2994_3.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address


[ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740275#comment-13740275
 ] 

Jing Zhao commented on HDFS-5055:
-

{code}
String machine = imageListenAddress.getAddress().isAnyLocalAddress() ? 
 null : imageListenAddress.getHostName();
{code}

Looks like here if the http address in the configuration is wrong, the 
UnknownHostException will cause imageListenAddress.getAddress() to return 
null. We thus may need to add an extra check here.

Other than that +1 for the patch.

 nn-2nn ignores dfs.namenode.secondary.http-address
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Attachments: HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-5050) Add DataNode support for mlock and munlock


 [ 
https://issues.apache.org/jira/browse/HDFS-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang reassigned HDFS-5050:
-

Assignee: Andrew Wang

 Add DataNode support for mlock and munlock
 --

 Key: HDFS-5050
 URL: https://issues.apache.org/jira/browse/HDFS-5050
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang

 Add DataNode support for mlock and munlock.  The DataNodes should respond to 
 RPCs telling them to mlock and munlock blocks.  Blocks should be uncached 
 when the NameNode asks for them to be moved or deleted.  For now, we should 
 cache only completed blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status


[ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740297#comment-13740297
 ] 

Hadoop QA commented on HDFS-5076:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598048/HDFS-5076.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4822//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4822//console

This message is automatically generated.

 Add MXBean methods to query NN's transaction information and JournalNode's 
 journal status
 -

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch, HDFS-5076.004.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide support to enable querying these information through JMX, so that 
 administrators and applications like Ambari can easily decide if a forced 
 checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean 
 interface for JournalNodes to query the status of journals (e.g., whether 
 journals are formatted or not).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail


 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-2994:
--

Attachment: HDFS-2994_4.patch

Thanks Konstantin. The newest patch addresses the above two comments.

 If lease is recovered successfully inline with create, create can fail
 --

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: amith
 Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
 HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address


 [ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5055:
--

Attachment: HDFS-5055.1.patch

Thanks Jing. Here is the updated patch.

 nn-2nn ignores dfs.namenode.secondary.http-address
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services

2013-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740337#comment-13740337
 ] 

Allen Wittenauer commented on HDFS-5087:


Rather than make something custom for heap, doesn't it make more sense to just 
process the command line parameters and strip duplicates?

For example, right now I'm fighting with the NN logging because I want to set 
hadoop.root.logger differently per service.  Is the expectation that we'll 
create one of these processing loops for all the values?  That won't scale.

 Allowing specific JAVA heap max setting for HDFS related services
 -

 Key: HDFS-5087
 URL: https://issues.apache.org/jira/browse/HDFS-5087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Reporter: Kai Zheng
Priority: Minor
 Attachments: HDFS-5087.patch


 This allows specific JAVA heap max setting for HDFS related services as it 
 does for YARN services, to be consistent. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


[ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740377#comment-13740377
 ] 

Hadoop QA commented on HDFS-4985:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598059/h4985.02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4823//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4823//console

This message is automatically generated.

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4994) Audit log getContentSummary() calls

2013-08-14 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740385#comment-13740385
 ] 

Kihwal Lee commented on HDFS-4994:
--

I know getListingInt() does it too, but it will be better if we do logAudit() 
outside of the FSNamespace lock. We could catch AccessControlException, record 
the failure, then rethrow. In the finally block, we can then call logAudit() 
with false if a failure was recorded, otherwise call it with true. 

 Audit log getContentSummary() calls
 ---

 Key: HDFS-4994
 URL: https://issues.apache.org/jira/browse/HDFS-4994
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.9, 2.3.0
Reporter: Kihwal Lee
Assignee: Robert Parker
Priority: Minor
  Labels: newbie
 Attachments: HDFS-4994_branch-0.23.patch, HDFS-4994.patch


 Currently there getContentSummary() calls are not logged anywhere. It should 
 be logged in the audit log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


[ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740388#comment-13740388
 ] 

Arpit Agarwal commented on HDFS-4985:
-

Reattaching correct patch file.

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


 [ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4985:


Attachment: h4985.02.patch

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


 [ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4985:


Attachment: (was: h4985.02.patch)

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services

2013-08-14 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740414#comment-13740414
 ] 

Kai Zheng commented on HDFS-5087:
-

bq.just process the command line parameters and strip duplicates?
Rather than do some post fix like this, wouldn't we have a consistent approach? 
Why introduce JAVA_HEAP_MAX? Either respect it always or discard it I would 
think.

bq.That won't scale.
I'm wondering if it's a good practice to add many application options and 
parameters like logging stuff via -D to JAVA command line.

 Allowing specific JAVA heap max setting for HDFS related services
 -

 Key: HDFS-5087
 URL: https://issues.apache.org/jira/browse/HDFS-5087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Reporter: Kai Zheng
Priority: Minor
 Attachments: HDFS-5087.patch


 This allows specific JAVA heap max setting for HDFS related services as it 
 does for YARN services, to be consistent. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4816) transitionToActive blocks if the SBN is doing checkpoint image transfer


 [ 
https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4816:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Great, thanks atm. Committed to trunk and branch-2.

 transitionToActive blocks if the SBN is doing checkpoint image transfer
 ---

 Key: HDFS-4816
 URL: https://issues.apache.org/jira/browse/HDFS-4816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.0.4-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.3.0

 Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch, 
 hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out


 The NN and SBN do this dance during checkpoint image transfer with nested 
 HTTP GETs via {{HttpURLConnection}}. When an admin does a 
 {{-transitionToActive}} during this transfer, part of that is interrupting an 
 ongoing checkpoint so we can transition immediately.
 However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets 
 swallowed by {{connection.getResponseCode()}} in 
 {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw 
 InterruptedException, so we need to do something else (perhaps HttpClient 
 [1]):
 [1]: http://hc.apache.org/httpclient-3.x/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4816) transitionToActive blocks if the SBN is doing checkpoint image transfer


[ 
https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740428#comment-13740428
 ] 

Hudson commented on HDFS-4816:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4261 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4261/])
HDFS-4816. transitionToActive blocks if the SBN is doing checkpoint image 
transfer. (Andrew Wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514095)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java


 transitionToActive blocks if the SBN is doing checkpoint image transfer
 ---

 Key: HDFS-4816
 URL: https://issues.apache.org/jira/browse/HDFS-4816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.0.4-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.3.0

 Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch, 
 hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out


 The NN and SBN do this dance during checkpoint image transfer with nested 
 HTTP GETs via {{HttpURLConnection}}. When an admin does a 
 {{-transitionToActive}} during this transfer, part of that is interrupting an 
 ongoing checkpoint so we can transition immediately.
 However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets 
 swallowed by {{connection.getResponseCode()}} in 
 {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw 
 InterruptedException, so we need to do something else (perhaps HttpClient 
 [1]):
 [1]: http://hc.apache.org/httpclient-3.x/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

[
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-4504:
---

Attachment: HDFS-4504.014.patch

DFSOutputStream#close doesn't always release resources (such as leases)
---

[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)


[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740434#comment-13740434
 ] 

Colin Patrick McCabe commented on HDFS-4504:


The latest patch:
* when reaping zombie files, don't use recoverLease.  Instead, add a force 
flag to completeFile.
* add {{dfs.client.close.timeout.ms}}, to specify how long we should wait 
inside close() before making the file a zombie.  Previously, we used 
{{ipc.ping.interval}} to determine how long to wait.  Having a configuration 
option for this makes a lot of unit tests that want to test close + 
unresponsive namenode much simpler to do.
* {{FSNamesystem#completeFile}} should issue a different log message on failure 
than on success.
* {{TestHFlush#testHFlushInterrupted}}: Thread#interrupted is a static 
function; refer to it statically to avoid Java warning.  Clear interrupted 
status when appropriate.

Since this is a bigger change, I added small whitespace changes in 
hadoop-mapreduce-client, hadoop-yarn, and hadoop-tools to get a full test run, 
so that we can become aware of any issues.

 DFSOutputStream#close doesn't always release resources (such as leases)
 ---

 Key: HDFS-4504
 URL: https://issues.apache.org/jira/browse/HDFS-4504
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
 HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
 HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch


 {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
 example is if there is a pipeline error and then pipeline recovery fails.  
 Unfortunately, in this case, some of the resources used by the 
 {{DFSOutputStream}} are leaked.  One particularly important resource is file 
 leases.
 So it's possible for a long-lived HDFS client, such as Flume, to write many 
 blocks to a file, but then fail to close it.  Unfortunately, the 
 {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
 the undead file.  Future attempts to close the file will just rethrow the 
 previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


[ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740453#comment-13740453
 ] 

Konstantin Shvachko commented on HDFS-5079:
---

+1 Looks good.
No test needed for code removal.

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address


[ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740452#comment-13740452
 ] 

Jing Zhao commented on HDFS-5055:
-

The new patch looks pretty good to me. +1.

 nn-2nn ignores dfs.namenode.secondary.http-address
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status


 [ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5076:


Attachment: HDFS-5076.005.patch

After some offline discussion with Suresh, we think a MXBean method 
getJournalStatus(String jid) may not be a good idea: a bad jid will also cause 
the creation a corresponding Journal object in JN, and this may allow malicious 
users to attack JN. 

Because Journal objects are created lazily, it is possible that a journal has 
been formatted but is not included in the journalsById list because of JN's 
restarting. In the current patch we just simply assume that if a directory has 
been created in the journal dir, the corresponding journal should have been 
formatted. We can also call analyzeStorage method to make sure if necessary. 

 Add MXBean methods to query NN's transaction information and JournalNode's 
 journal status
 -

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch, HDFS-5076.004.patch, HDFS-5076.005.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide support to enable querying these information through JMX, so that 
 administrators and applications like Ambari can easily decide if a forced 
 checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean 
 interface for JournalNodes to query the status of journals (e.g., whether 
 journals are formatted or not).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail


[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740494#comment-13740494
 ] 

Hadoop QA commented on HDFS-2994:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598074/HDFS-2994_4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4825//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4825//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4825//console

This message is automatically generated.

 If lease is recovered successfully inline with create, create can fail
 --

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: amith
 Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
 HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address


[ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740493#comment-13740493
 ] 

Hadoop QA commented on HDFS-5055:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598075/HDFS-5055.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4824//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4824//console

This message is automatically generated.

 nn-2nn ignores dfs.namenode.secondary.http-address
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5055) nn fails to download checkpointed image from snn in some setups


 [ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5055:
--

Summary: nn fails to download checkpointed image from snn in some setups  
(was: nn-2nn ignores dfs.namenode.secondary.http-address)

 nn fails to download checkpointed image from snn in some setups
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5055) nn fails to download checkpointed image from snn in some setups


 [ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5055:
--

   Resolution: Fixed
Fix Version/s: 2.1.1-beta
   Status: Resolved  (was: Patch Available)

Committed the patch to trunk, branch-2 and branch-2.1.

Thanks Jing for the review and Allen for verifying it works. Thank you Vinay 
for the patch!

 nn fails to download checkpointed image from snn in some setups
 ---

 Key: HDFS-5055
 URL: https://issues.apache.org/jira/browse/HDFS-5055
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Allen Wittenauer
Assignee: Vinay
Priority: Blocker
  Labels: regression
 Fix For: 2.1.1-beta

 Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch


 The primary namenode attempts to connect back to (incoming hostname):port 
 regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-5097) TestDoAsEffectiveUser can fail on JDK 7

2013-08-14 Thread Aaron T. Myers (JIRA)

Aaron T. Myers created HDFS-5097:


 Summary: TestDoAsEffectiveUser can fail on JDK 7
 Key: HDFS-5097
 URL: https://issues.apache.org/jira/browse/HDFS-5097
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


Another issue with the test method execution order changing between JDK 6 and 7.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5077) NPE in FSNamesystem.commitBlockSynchronization()


[ 
https://issues.apache.org/jira/browse/HDFS-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740512#comment-13740512
 ] 

Konstantin Shvachko commented on HDFS-5077:
---

Yes this is pretty rare, but if I hit it (yes while testing HA), so others 
could too. The reason is similar to yours. But whatever the reason we should 
fix NPE.

 NPE in FSNamesystem.commitBlockSynchronization()
 

 Key: HDFS-5077
 URL: https://issues.apache.org/jira/browse/HDFS-5077
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.5-alpha
Reporter: Konstantin Shvachko

 NN starts a block recovery, which will synchronize block replicas on 
 different DNs. In the end one of DNs will report the list of the nodes 
 containing the consistent replicas to the NN via commitBlockSynchronization() 
 call. The NPE happens if just before processing commitBlockSynchronization() 
 NN removes from active one of DNs that are then reported in the call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-5098) Enhance FileSystem.Statistics to have locality information

2013-08-14 Thread Bikas Saha (JIRA)

Bikas Saha created HDFS-5098:


 Summary: Enhance FileSystem.Statistics to have locality information
 Key: HDFS-5098
 URL: https://issues.apache.org/jira/browse/HDFS-5098
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Bikas Saha
 Fix For: 2.1.1-beta


Currently in MR/Tez we dont have a good and accurate means to detect how much 
the the IO was actually done locally. Getting this information from the source 
of truth would be much better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5004) Add additional JMX bean for NameNode status data


[ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740532#comment-13740532
 ] 

Konstantin Shvachko commented on HDFS-5004:
---

Cos, you need to move the record about this jira in CHANGES.txt on trunk under 
2.3.0 section. It is inconsistent now.

 Add additional JMX bean for NameNode status data
 

 Key: HDFS-5004
 URL: https://issues.apache.org/jira/browse/HDFS-5004
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5004.diff, HDFS-5004.diff, HDFS-5004.diff


 Currently the JMX beans returns much of the data contained on the HDFS Health 
 webpage (dfsHealth.html). However there are several other attributes that are 
 required to be added, that can only be accessed from within NameNode.
 For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
 expose the following attributes in NameNode:
 Role
 State
 HostAndPort
 also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active


[ 
https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740531#comment-13740531
 ] 

Suresh Srinivas commented on HDFS-5080:
---

+1 for the patch. 

 BootstrapStandby not working with QJM when the existing NN is active
 

 Key: HDFS-5080
 URL: https://issues.apache.org/jira/browse/HDFS-5080
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, 
 HDFS-5080.002.patch


 Currently when QJM is used, running BootstrapStandby while the existing NN is 
 active can get the following exception:
 {code}
 FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 
 from the configured shared edits storage. Please copy these logs into the 
 shared edits storage or call saveNamespace on the active node.
 Error: Gap in transactions. Expected to be able to read up until at least 
 txid 6175405 but unable to find any edit logs containing txid 6175405
 java.io.IOException: Gap in transactions. Expected to be able to read up 
 until at least txid 6175405 but unable to find any edit logs containing txid 
 6175405
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229)
 {code}
 Looks like the cause of the exception is that, when the active NN is queries 
 by BootstrapStandby about the last written transaction ID, the in-progress 
 edit log segment is included. However, when journal nodes are asked about the 
 last written transaction ID, in-progress edit log is excluded. This causes 
 BootstrapStandby#checkLogsAvailableForRead to complain gaps. 
 To fix this, we can either let journal nodes take into account the 
 in-progress editlog, or let active NN exclude the in-progress edit log 
 segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5068) Convert NNThroughputBenchmark to a Tool to allow generic options.


 [ 
https://issues.apache.org/jira/browse/HDFS-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-5068:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk and branch-2.3. Will move it further down if requested.

 Convert NNThroughputBenchmark to a Tool to allow generic options.
 -

 Key: HDFS-5068
 URL: https://issues.apache.org/jira/browse/HDFS-5068
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: benchmarks
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.3.0

 Attachments: NNThBenchTool.patch, NNThBenchTool.patch


 Currently NNThroughputBenchmark does not recognize generic options like 
 -conf, etc. A simple way to enable such functionality is to make it implement 
 Tool interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active


[ 
https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740542#comment-13740542
 ] 

Jing Zhao commented on HDFS-5080:
-

Thanks for the review, Suresh!

I plan to commit this early next morning if there is no more comment. We can 
open new jiras for comments after committing.

 BootstrapStandby not working with QJM when the existing NN is active
 

 Key: HDFS-5080
 URL: https://issues.apache.org/jira/browse/HDFS-5080
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, 
 HDFS-5080.002.patch


 Currently when QJM is used, running BootstrapStandby while the existing NN is 
 active can get the following exception:
 {code}
 FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 
 from the configured shared edits storage. Please copy these logs into the 
 shared edits storage or call saveNamespace on the active node.
 Error: Gap in transactions. Expected to be able to read up until at least 
 txid 6175405 but unable to find any edit logs containing txid 6175405
 java.io.IOException: Gap in transactions. Expected to be able to read up 
 until at least txid 6175405 but unable to find any edit logs containing txid 
 6175405
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229)
 {code}
 Looks like the cause of the exception is that, when the active NN is queries 
 by BootstrapStandby about the last written transaction ID, the in-progress 
 edit log segment is included. However, when journal nodes are asked about the 
 last written transaction ID, in-progress edit log is excluded. This causes 
 BootstrapStandby#checkLogsAvailableForRead to complain gaps. 
 To fix this, we can either let journal nodes take into account the 
 in-progress editlog, or let active NN exclude the in-progress edit log 
 segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


 [ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-5079:
--

   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)
   Status: Resolved  (was: Patch Available)

I just committed this to trunk. Thank you Tao.

 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 3.0.0

 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)


[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740546#comment-13740546
 ] 

Hadoop QA commented on HDFS-4504:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598096/HDFS-4504.014.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-tools/hadoop-distcp hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace
  
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
  org.apache.hadoop.hdfs.TestHdfsClose
  org.apache.hadoop.hdfs.TestFileAppend4
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-tools/hadoop-distcp hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.hdfs.TestFileAppend3

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4827//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4827//console

This message is automatically generated.

 DFSOutputStream#close doesn't always release resources (such as leases)
 ---

 Key: HDFS-4504
 URL: https://issues.apache.org/jira/browse/HDFS-4504
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
 HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
 HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch


 {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
 example is if there is a pipeline error and then pipeline recovery fails.  
 Unfortunately, in this case, some of the resources used by the 
 {{DFSOutputStream}} are leaked.  One particularly important resource is file 
 leases.
 So it's possible for a long-lived HDFS client, such as Flume, to write many 
 blocks to a file, but then fail to close it.  Unfortunately, the 
 {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
 the undead file.  Future attempts to close the file will just rethrow the 
 previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap


[ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740551#comment-13740551
 ] 

Colin Patrick McCabe commented on HDFS-4953:


bq. brandon wrote: 1. DFSClient: looks like all the DFSClient instances share 
the same ClientMmapManager instance. If this is the case, why not have one 
static ClientMmapManager with a refcount to it, and remove 
ClientMmapManagerFactory class and variable mmapManager?

I think it's better to have the refcount and manager instance encapsulated as 
private data inside an object, rather than floating around in the DFSClient 
class, because it prevents errors where someone might access the field without 
updating the reference count properly.

bq. 2. HdfsZeroCopyCursor: might want to also initialize allowShortReads in the 
constructor. 

Users can't create instances of HdfsZeroCopyCursor directly (it's 
package-private).  {{DFSInputStream#createZeroCopyConstructor}} creates them.  
We could start adding booleans to this function, but it seems clearer for 
people to just use setAllowShortReads.  The kind of mess we have with 
FileSystem#create where there are a dozen different overloads and nobody can 
keep them straight is an antipattern.

bq. ... Not sure which case is more expected by the users, shortReads allowed 
or disallowed. 

That's a good question.  My experience has been that many developers don't 
handle short reads very well (sometimes including me).  It's just another 
corner case that they have to remember to handle, and if they're not FS 
developers they often don't even realize that it can happen.  So I have 
defaulted it to off unless it's explicitly requested.

bq. 4. DFSInputStream: remove unused import and add debug level check for 
DFSClient.LOG.Debug().

OK

bq. 5. TestBlockReader: Assume.assumeTrue(SystemUtils.IS_OS_UNIX), guess you 
meant IS_OS_LINUX

mmap is present and supported on other UNIXes besides Linux

bq. 6. test_libhdfs_zerocopy.c: remove repeated

fixed

bq. 7. TestBlockReaderLocal.java: remove unused import

ok

bq. 8. please add javadoc to some classes, e.g., ClientMap,ClientMapManager

ok

bq. andrew wrote: [hdfs-default.xml] Has some extra lines of java pasted in.

fixed

bq. Let's beef up the [zerocopycursor] javadoc

I added an example.

bq. read() javadoc: EOF here refers to an EOF when reading a block, not EOF of 
the HDFS file. Would prefer to see end of block.

EOF is only thrown at end-of-file, as described in the JavaDoc.

bq. Would like to see explicit setting of allowShortReads to false in the 
constructor for clarity.

done

bq. serialVersionUID should be private

ok

bq. Maybe rename put to unref or close? It's not actually putting in the data 
structure sense, which is confusing.

renamed to unref

bq. let's not call people bad programmers, just say accidentally leaked 
references.

I changed this to code which leaks references accidentally to make it more 
boring

bq. unmap: add to javadoc that it should only be called if the manager has been 
closed, or by the manager with the lock held.

I added Should be called with the ClientMmapManager lock held

bq. Need a space before the =.

ok

bq. Let's add some javadoc on... why it's important to cache [mmaps]

added to ClientMmapManager

bq. I think fromConf-style factory methods are more normally called get, e.g. 
FileSystem.get.

FileSystem#get uses a cache, whereas ClientMmapManager#fromConf does not.  I 
think it would be confusing to name them similarly...

bq. Why is the CacheCleaner executor using half the timeout for the delay and 
period?

Half the timeout period is the minimum period for which we can ensure that we 
time out mmaps on time.  Think about if we used the timeout itself as the 
period.  In that case, we might be 1 second away from the 15-minute (or 
whatever) expiration period when the cleaner thread runs.  Then we have to wait 
another 15 minutes, effectively doubling the timeout.

bq. We might in fact want to key off of System.nanoTime for fewer collisions

Good point; changed.

bq. I think evictOne would be clearer if you used TreeSet#pollFirst rather than 
an iterator.

yeah, changed

bq. This has 10 spaces, where elsewhere in the file you use a double-indent of 
4.

ok, I'll make it 4

bq. Remaining TODO for blocks bigger than 2GB, want to file a follow-on JIRA 
for this?

filed

bq. readZeroCopy catches and re-sets the interrupted status, does something 
else check this later?

No.  It would only happen if some third-party software delivered an 
InterruptedException to us.  In that case the client is responsible for 
checking and doing something with the InterruptedException (or not).  This all 
happens in the client thread.

bq. Is it worth re-trying the mmap after a CacheCleaner period in case some 
space has been freed up in the cache?

BlockReader objects get destroyed and re-created a lot.  For example, a long 
seek

[jira] [Commented] (HDFS-5068) Convert NNThroughputBenchmark to a Tool to allow generic options.


[ 
https://issues.apache.org/jira/browse/HDFS-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740554#comment-13740554
 ] 

Hudson commented on HDFS-5068:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4262 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4262/])
HDFS-5068. Convert NNThroughputBenchmark to a Tool to allow generic options. 
Contributed by Konstantin Shvachko. (shv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514114)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java


 Convert NNThroughputBenchmark to a Tool to allow generic options.
 -

 Key: HDFS-5068
 URL: https://issues.apache.org/jira/browse/HDFS-5068
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: benchmarks
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.3.0

 Attachments: NNThBenchTool.patch, NNThBenchTool.patch


 Currently NNThroughputBenchmark does not recognize generic options like 
 -conf, etc. A simple way to enable such functionality is to make it implement 
 Tool interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740555#comment-13740555
 ] 

Hudson commented on HDFS-5051:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4262 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4262/])
HDFS-5051. nn fails to download checkpointed image from snn in some setups. 
Contributed by Vinay and Suresh Srinivas. (suresh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514110)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java


 Propagate cache status information from the DataNode to the NameNode
 

 Key: HDFS-5051
 URL: https://issues.apache.org/jira/browse/HDFS-5051
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch


 The DataNode needs to inform the NameNode of its current cache state. Let's 
 wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.


[ 
https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740556#comment-13740556
 ] 

Hudson commented on HDFS-5079:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4262 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4262/])
HDFS-5079. Cleaning up NNHAStatusHeartbeat.State from DatanodeProtocolProtos. 
Contributed by Tao Luo. (shv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514118)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto


 Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
 -

 Key: HDFS-5079
 URL: https://issues.apache.org/jira/browse/HDFS-5079
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 3.0.0

 Attachments: HDFS-5079.patch


 NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective 
 class should also be removed from DatanodeProtocolProtos.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations


[ 
https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740564#comment-13740564
 ] 

Hadoop QA commented on HDFS-4985:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598084/h4985.02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4826//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4826//console

This message is automatically generated.

 Add storage type to the protocol and expose it in block report and block 
 locations
 --

 Key: HDFS-4985
 URL: https://issues.apache.org/jira/browse/HDFS-4985
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Arpit Agarwal
 Attachments: h4985.02.patch


 With HDFS-2880 datanode now supports storage abstraction. This is to add 
 storage type in to the protocol. Datanodes currently report blocks per 
 storage. Storage would include storage type attribute. Namenode also exposes 
 the storage type of a block in block locations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing

Chuan Liu created HDFS-5099:
---

 Summary: Namenode#copyEditLogSegmentsToSharedDir should close 
EditLogInputStreams upon finishing
 Key: HDFS-5099
 URL: https://issues.apache.org/jira/browse/HDFS-5099
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu


In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection 
of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} 
method, we will open the underlying log file on disk. After applying all the 
opts, we do not close the collection of streams currently. This lead to a file 
handle leak on Windows as later we would fail to delete those files.

This happens in TestInitializeSharedEdits test case, where we explicitly called 
{{Namenode# initializeSharedEdits()}}, where 
{{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new 
MiniDFSCluster with the following exception.
{noformat}
java.io.IOException: Could not fully delete 
C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
…
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing


 [ 
https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HDFS-5099:


Status: Patch Available  (was: Open)

 Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon 
 finishing
 ---

 Key: HDFS-5099
 URL: https://issues.apache.org/jira/browse/HDFS-5099
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: HDFS-5099-trunk.patch


 In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection 
 of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} 
 method, we will open the underlying log file on disk. After applying all the 
 opts, we do not close the collection of streams currently. This lead to a 
 file handle leak on Windows as later we would fail to delete those files.
 This happens in TestInitializeSharedEdits test case, where we explicitly 
 called {{Namenode# initializeSharedEdits()}}, where 
 {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new 
 MiniDFSCluster with the following exception.
 {noformat}
 java.io.IOException: Could not fully delete 
 C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 …
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing


 [ 
https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HDFS-5099:


Attachment: HDFS-5099-trunk.patch

Attaching the patch that closes all the streams in the finally clause. All 
other code changes are just indentation for the new try clause.

 Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon 
 finishing
 ---

 Key: HDFS-5099
 URL: https://issues.apache.org/jira/browse/HDFS-5099
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: HDFS-5099-trunk.patch


 In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection 
 of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} 
 method, we will open the underlying log file on disk. After applying all the 
 opts, we do not close the collection of streams currently. This lead to a 
 file handle leak on Windows as later we would fail to delete those files.
 This happens in TestInitializeSharedEdits test case, where we explicitly 
 called {{Namenode# initializeSharedEdits()}}, where 
 {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new 
 MiniDFSCluster with the following exception.
 {noformat}
 java.io.IOException: Could not fully delete 
 C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 …
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail


 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-2994:
--

Attachment: HDFS-2994_4.patch

 If lease is recovered successfully inline with create, create can fail
 --

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: amith
 Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
 HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail


 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-2994:
--

Attachment: (was: HDFS-2994_4.patch)

 If lease is recovered successfully inline with create, create can fail
 --

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: amith
 Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
 HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status


[ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740602#comment-13740602
 ] 

Hadoop QA commented on HDFS-5076:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598112/HDFS-5076.005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4828//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4828//console

This message is automatically generated.

 Add MXBean methods to query NN's transaction information and JournalNode's 
 journal status
 -

 Key: HDFS-5076
 URL: https://issues.apache.org/jira/browse/HDFS-5076
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
 HDFS-5076.003.patch, HDFS-5076.004.patch, HDFS-5076.005.patch


 Currently NameNode already provides RPC calls to get its last applied 
 transaction ID and most recent checkpoint's transaction ID. It can be helpful 
 to provide support to enable querying these information through JMX, so that 
 administrators and applications like Ambari can easily decide if a forced 
 checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean 
 interface for JournalNodes to query the status of journals (e.g., whether 
 journals are formatted or not).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4953) enable HDFS local reads via mmap

[
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-4953:
---

Attachment: HDFS-4953.007.patch

new patch version. I added the tests for the cache, and a java test for no
backing buffer.

enable HDFS local reads via mmap

Key: HDFS-4953
URL: https://issues.apache.org/jira/browse/HDFS-4953
Project: Hadoop HDFS
Issue Type: New Feature
Affects Versions: 2.3.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch,
HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch,
HDFS-4953.006.patch, HDFS-4953.007.patch

Currently, the short-circuit local read pathway allows HDFS clients to access
files directly without going through the DataNode. However, all of these
reads involve a copy at the operating system level, since they rely on the
read() / pread() / etc family of kernel interfaces.
We would like to enable HDFS to read local files via mmap. This would enable
truly zero-copy reads.
In the initial implementation, zero-copy reads will only be performed when
checksums were disabled. Later, we can use the DataNode's cache awareness to
only perform zero-copy reads when we know that checksum has already been
verified.

[jira] [Commented] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing


[ 
https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740644#comment-13740644
 ] 

Hadoop QA commented on HDFS-5099:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598132/HDFS-5099-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4829//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4829//console

This message is automatically generated.

 Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon 
 finishing
 ---

 Key: HDFS-5099
 URL: https://issues.apache.org/jira/browse/HDFS-5099
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: HDFS-5099-trunk.patch


 In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection 
 of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} 
 method, we will open the underlying log file on disk. After applying all the 
 opts, we do not close the collection of streams currently. This lead to a 
 file handle leak on Windows as later we would fail to delete those files.
 This happens in TestInitializeSharedEdits test case, where we explicitly 
 called {{Namenode# initializeSharedEdits()}}, where 
 {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new 
 MiniDFSCluster with the following exception.
 {noformat}
 java.io.IOException: Could not fully delete 
 C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 …
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing