date:20120712

Todd Lipcon created HDFS-3644:
-

 Summary: OEV should recognize and deal with 0.20.20x opcode 
versions
 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor


We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
newer versions. For edit log loading, we dealt with this by forcing users to 
save namespace on an earlier version before upgrading. But, using a trunk OEV 
on an older version is useful since the OEV has had so many improvements. It 
would be nice to be able to specify a flag to the OEV to be able to run on 
older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

[
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412541#comment-13412541
]

Hadoop QA commented on HDFS-799:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536171/HDFS-799.005.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDatanodeBlockScanner

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2804//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2804//console

This message is automatically generated.

libhdfs must call DetachCurrentThread when a thread is destroyed

Key: HDFS-799
URL: https://issues.apache.org/jira/browse/HDFS-799
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Christian Kunz
Assignee: Colin Patrick McCabe
Attachments: HDFS-799.001.patch, HDFS-799.003.patch,
HDFS-799.004.patch, HDFS-799.005.patch

Threads that call AttachCurrentThread in libhdfs and disappear without
calling DetachCurrentThread cause a memory leak.
Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3606) libhdfs: create self-contained unit test

[
https://issues.apache.org/jira/browse/HDFS-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412544#comment-13412544
]

Hadoop QA commented on HDFS-3606:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536173/HDFS-3606.004.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 2 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestBackupNode
org.apache.hadoop.hdfs.server.common.TestJspHelper

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2805//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2805//console

This message is automatically generated.

libhdfs: create self-contained unit test

Key: HDFS-3606
URL: https://issues.apache.org/jira/browse/HDFS-3606
Project: Hadoop HDFS
Issue Type: Test
Components: libhdfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Attachments: HDFS-3606.001.patch, HDFS-3606.003.patch,
HDFS-3606.004.patch

We should have a self-contained unit test for libhdfs and also for FUSE.
We do have hdfs_test, but it is not self-contained (it requires a cluster to
already be running before it can be used.)

[jira] [Updated] (HDFS-3497) Update Balancer for performance optimization with Node Group- choose the target and source node on the same node group for balancing as the first priority


 [ 
https://issues.apache.org/jira/browse/HDFS-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-3497:
-

Attachment: HDFS-3497.patch

 Update Balancer for performance optimization with Node Group- choose the 
 target and source node on the same node group for balancing as the first 
 priority
 ---

 Key: HDFS-3497
 URL: https://issues.apache.org/jira/browse/HDFS-3497
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Junping Du
Assignee: Junping Du
 Attachments: HDFS-3497.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3497) Update Balancer for performance optimization with Node Group- choose the target and source node on the same node group for balancing as the first priority


 [ 
https://issues.apache.org/jira/browse/HDFS-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-3497:
-

Status: Patch Available  (was: Open)

 Update Balancer for performance optimization with Node Group- choose the 
 target and source node on the same node group for balancing as the first 
 priority
 ---

 Key: HDFS-3497
 URL: https://issues.apache.org/jira/browse/HDFS-3497
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: HDFS-3497.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3497) Update Balancer for performance optimization with Node Group- choose the target and source node on the same node group for balancing as the first priority


[ 
https://issues.apache.org/jira/browse/HDFS-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412570#comment-13412570
 ] 

Junping Du commented on HDFS-3497:
--

This patch is for adding additional layer of nodegroup to Balancer. This patch 
is only one part (major part), the other part is tracked in HDFS-3496 

 Update Balancer for performance optimization with Node Group- choose the 
 target and source node on the same node group for balancing as the first 
 priority
 ---

 Key: HDFS-3497
 URL: https://issues.apache.org/jira/browse/HDFS-3497
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Junping Du
Assignee: Junping Du
 Attachments: HDFS-3497.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3497) Update Balancer policy with NodeGroup layer

[
https://issues.apache.org/jira/browse/HDFS-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Junping Du updated HDFS-3497:
-

Description:
1. Make sure Network Topology and BlockPlacementPolicy check in balancer is
compatible with new one with adding NodeGroup layer.
2. Update balancer policy for performance optimization with Node Group -
choose the target and source node on the same node group for balancing as the
first priority.
3. Make sure balancing policy will not eliminate reliability on environment
with nodegroup (virtualization) that verify good target on NodeGroup
relationship. (This part of work is separated out and tracked in HDFS-3496)
Summary: Update Balancer policy with NodeGroup layer (was: Update
Balancer for performance optimization with Node Group- choose the target and
source node on the same node group for balancing as the first priority)

Update Balancer policy with NodeGroup layer
---

Key: HDFS-3497
URL: https://issues.apache.org/jira/browse/HDFS-3497
Project: Hadoop HDFS
Issue Type: Sub-task
Components: balancer
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Junping Du
Assignee: Junping Du
Attachments: HDFS-3497.patch

1. Make sure Network Topology and BlockPlacementPolicy check in balancer is
compatible with new one with adding NodeGroup layer.
2. Update balancer policy for performance optimization with Node Group -
choose the target and source node on the same node group for balancing as
the first priority.
3. Make sure balancing policy will not eliminate reliability on environment
with nodegroup (virtualization) that verify good target on NodeGroup
relationship. (This part of work is separated out and tracked in HDFS-3496)

[jira] [Commented] (HDFS-3497) Update Balancer policy with NodeGroup layer

[
https://issues.apache.org/jira/browse/HDFS-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412613#comment-13412613
]

Hadoop QA commented on HDFS-3497:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536183/HDFS-3497.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestBackupNode
org.apache.hadoop.hdfs.server.common.TestJspHelper

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2806//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2806//console

This message is automatically generated.

Update Balancer policy with NodeGroup layer
---

[jira] [Commented] (HDFS-3477) FormatZK and ZKFC startup can fail due to zkclient connection establishment delay

2012-07-12 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412675#comment-13412675
 ] 

Rakesh R commented on HDFS-3477:


Added links to HDFS-3635 as I feel the cause is same and failing after timeout:
{code}java.lang.Exception: test timed out after 3 milliseconds
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.ha.ZKFailoverController.waitForActiveAttempt(ZKFailoverController.java:457)
at 
org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:645)
at 
org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:58)
at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:590)
at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:587){code}

 FormatZK and ZKFC startup can fail due to zkclient connection establishment 
 delay
 -

 Key: HDFS-3477
 URL: https://issues.apache.org/jira/browse/HDFS-3477
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: auto-failover
Affects Versions: 2.0.1-alpha
Reporter: suja s
Assignee: Rakesh R
 Attachments: HDFS-3477.1.patch, HDFS-3477.2.patch, HDFS-3477.3.patch, 
 HDFS-3477.3.patch, HDFS-3477.patch


 Format and ZKFC startup flows continue further after creation of zkclient 
 connection without waiting to check whether the connection is completely 
 established. This  leads to failure at the subsequent point if connection was 
 not complete by then.
 Exception trace for format 
 {noformat}
 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to 
 HOST-xx-xx-xx-55/xx.xx.xx.55:2182, initiating session
 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete 
 on server HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, 
 negotiated timeout = 5000
 12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from 
 old client with sessionId 0x1379da4660c0014
 12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed
 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down
 Exception in thread main java.io.IOException: Couldn't determine existence 
 of znode '/hadoop-ha/hacluster'
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
 at 
 org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257)
 at 
 org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195)
 at 
 org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
 at 
 org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163)
 at 
 org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159)
 at 
 org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171)
 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
 KeeperErrorCode = ConnectionLoss for /hadoop-ha/hacluster
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
 ... 8 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3645) Improve the way we do detection of a busy DN in the cluster, when choosing it for a block write

Harsh J created HDFS-3645:
-

 Summary: Improve the way we do detection of a busy DN in the 
cluster, when choosing it for a block write
 Key: HDFS-3645
 URL: https://issues.apache.org/jira/browse/HDFS-3645
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Right now, I think we do too naive a computation for detecting if a chosen DN 
target is busy by itself. We currently do {{node.getXceiverCount()  (2.0 * 
avgLoad)}}.

We should improve on this computation with a more realistic measure of if a DN 
is really busy by itself or not (rather than checking against cluster average, 
where there's a good chance the value can be wrong to compare with, for some 
cases)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3497) Update Balancer policy with NodeGroup layer


[ 
https://issues.apache.org/jira/browse/HDFS-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412703#comment-13412703
 ] 

Junping Du commented on HDFS-3497:
--

The test failure is tracked by HDFS-3625 which is not related to this patch.

 Update Balancer policy with NodeGroup layer
 ---

 Key: HDFS-3497
 URL: https://issues.apache.org/jira/browse/HDFS-3497
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Junping Du
Assignee: Junping Du
 Attachments: HDFS-3497.patch


 1. Make sure Network Topology and BlockPlacementPolicy check in balancer is 
 compatible with new one with adding NodeGroup layer.
 2. Update balancer policy for performance optimization with Node Group - 
 choose the target and source node on the same node group for balancing as 
 the first priority.
 3. Make sure balancing policy will not eliminate reliability on environment 
 with nodegroup (virtualization) that verify good target on NodeGroup 
 relationship. (This part of work is separated out and tracked in HDFS-3496) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3615) Two BlockTokenSecretManager findbugs warnings


[ 
https://issues.apache.org/jira/browse/HDFS-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412714#comment-13412714
 ] 

Hudson commented on HDFS-3615:
--

Integrated in Hadoop-Hdfs-trunk #1101 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1101/])
HDFS-3615. Two BlockTokenSecretManager findbugs warnings. Contributed by 
Aaron T. Myers. (Revision 1360255)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360255
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java


 Two BlockTokenSecretManager findbugs warnings
 -

 Key: HDFS-3615
 URL: https://issues.apache.org/jira/browse/HDFS-3615
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3615.patch


 Looks like two findbugs warnings were introduced recently (see these across a 
 couple recent patches). Unclear what change introduced it as the file hasn't 
 been modified and recent committed changes pass the findbugs check.
 ISInconsistent synchronization of 
 org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.keyUpdateInterval;
  locked 75% of time
 ISInconsistent synchronization of 
 org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.serialNo; 
 locked 75% of time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3582) Hook daemon process exit for testing


[ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412713#comment-13412713
 ] 

Hudson commented on HDFS-3582:
--

Integrated in Hadoop-Hdfs-trunk #1101 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1101/])
HDFS-3582. Hook daemon process exit for testing. Contributed by Eli Collins 
(Revision 1360329)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360329
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperAsHASharedDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestClusterId.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogJournalFailures.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureOfSharedDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyIsHot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java


 Hook daemon process exit for testing 
 -

 Key: HDFS-3582
 URL: https://issues.apache.org/jira/browse/HDFS-3582
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
 hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
 hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt


 Occasionally the tests fail with java.util.concurrent.ExecutionException: 
 org.apache.maven.surefire.booter.SurefireBooterForkException:
 Error occurred in starting fork, check output in log because the NN is 
 exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't 
 retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
 know which part of the test triggered which exit in HDFS. To make this easier 
 to debug let's hook all daemon process exits when running the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3639) JspHelper#getUGI should always verify the token if security is enabled


[ 
https://issues.apache.org/jira/browse/HDFS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412716#comment-13412716
 ] 

Hudson commented on HDFS-3639:
--

Integrated in Hadoop-Hdfs-trunk #1101 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1101/])
HDFS-3639. JspHelper#getUGI should always verify the token if security is 
enabled. Contributed by Eli Collins (Revision 1360485)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360485
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java


 JspHelper#getUGI should always verify the token if security is enabled
 --

 Key: HDFS-3639
 URL: https://issues.apache.org/jira/browse/HDFS-3639
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Fix For: 1.2.0, 2.0.1-alpha

 Attachments: hdfs-3639-b1.txt, hdfs-3639.txt


 JspHelper#getUGI on verifies the given token if the context and nn are set 
 (added in HDFS-2416). We should unconditionally verifyToken the token, ie a 
 bug where name.node is not set in the context object should not result in 
 not verifying the token. In practice this shouldn't be an issue as per 
 HDFS-3434 the context and NN should never be null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances

Kihwal Lee created HDFS-3646:


 Summary: LeaseRenewer can hold reference to inactive DFSClient 
instances
 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0-alpha, 0.23.3
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
reference to a {{DFSClient}} instance in {{dfsclients}} forever. This prevents 
{{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage collected, 
leading to memory leak.

{{LeaseRenewer}} should remove the reference after some delay, if a 
{{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever


 [ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3646:
-

Summary: LeaseRenewer can hold reference to inactive DFSClient instances 
forever  (was: LeaseRenewer can hold reference to inactive DFSClient instances)

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3615) Two BlockTokenSecretManager findbugs warnings


[ 
https://issues.apache.org/jira/browse/HDFS-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412805#comment-13412805
 ] 

Hudson commented on HDFS-3615:
--

Integrated in Hadoop-Mapreduce-trunk #1134 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1134/])
HDFS-3615. Two BlockTokenSecretManager findbugs warnings. Contributed by 
Aaron T. Myers. (Revision 1360255)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360255
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java


 Two BlockTokenSecretManager findbugs warnings
 -

 Key: HDFS-3615
 URL: https://issues.apache.org/jira/browse/HDFS-3615
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3615.patch


 Looks like two findbugs warnings were introduced recently (see these across a 
 couple recent patches). Unclear what change introduced it as the file hasn't 
 been modified and recent committed changes pass the findbugs check.
 ISInconsistent synchronization of 
 org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.keyUpdateInterval;
  locked 75% of time
 ISInconsistent synchronization of 
 org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.serialNo; 
 locked 75% of time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3582) Hook daemon process exit for testing


[ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412804#comment-13412804
 ] 

Hudson commented on HDFS-3582:
--

Integrated in Hadoop-Mapreduce-trunk #1134 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1134/])
HDFS-3582. Hook daemon process exit for testing. Contributed by Eli Collins 
(Revision 1360329)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360329
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ExitUtil.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperAsHASharedDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestClusterId.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogJournalFailures.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureOfSharedDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyIsHot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java


 Hook daemon process exit for testing 
 -

 Key: HDFS-3582
 URL: https://issues.apache.org/jira/browse/HDFS-3582
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
 hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
 hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt


 Occasionally the tests fail with java.util.concurrent.ExecutionException: 
 org.apache.maven.surefire.booter.SurefireBooterForkException:
 Error occurred in starting fork, check output in log because the NN is 
 exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't 
 retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
 know which part of the test triggered which exit in HDFS. To make this easier 
 to debug let's hook all daemon process exits when running the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3639) JspHelper#getUGI should always verify the token if security is enabled


[ 
https://issues.apache.org/jira/browse/HDFS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412807#comment-13412807
 ] 

Hudson commented on HDFS-3639:
--

Integrated in Hadoop-Mapreduce-trunk #1134 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1134/])
HDFS-3639. JspHelper#getUGI should always verify the token if security is 
enabled. Contributed by Eli Collins (Revision 1360485)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360485
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java


 JspHelper#getUGI should always verify the token if security is enabled
 --

 Key: HDFS-3639
 URL: https://issues.apache.org/jira/browse/HDFS-3639
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Fix For: 1.2.0, 2.0.1-alpha

 Attachments: hdfs-3639-b1.txt, hdfs-3639.txt


 JspHelper#getUGI on verifies the given token if the context and nn are set 
 (added in HDFS-2416). We should unconditionally verifyToken the token, ie a 
 bug where name.node is not set in the context object should not result in 
 not verifying the token. In practice this shouldn't be an issue as per 
 HDFS-3434 the context and NN should never be null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever

2012-07-12 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412815#comment-13412815
 ] 

Uma Maheswara Rao G commented on HDFS-3646:
---

Kihwal, Thanks for filing the JIRA.
I have seen this. One possible option to fix this issue is:
Actually lease renewer required for the opened files. So, while opening the 
file it can add the renewer if there is no client present in Renewer's list of 
clients.
So, file close can remove the dfsCLinet instance completely if there is no 
filesBeingWritten with that client. Means that, if there is no open files with 
a particular DFSClient, then that clinet will not be there with renewer. If the 
same DFSClient wants to open new file, it will take care of adding client to 
renewer.
How does this sounds to you?


 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412820#comment-13412820
 ] 

Kihwal Lee commented on HDFS-3646:
--

Thanks Uma. That makes sense.

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412821#comment-13412821
 ] 

Daryn Sharp commented on HDFS-3646:
---

There will be caveats, such as the leak will still occur if client code doesn't 
explicitly close all streams.  I'm not sure how you can tell if there are no 
more references since {{DFSClient}} holds references to all open streams.  
Maybe weak references to the streams could be used?

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems


[ 
https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412824#comment-13412824
 ] 

Daryn Sharp commented on HDFS-3513:
---

The hive jira has hilighted complexities and possible issues with trying to 
cache ugis/filesystems.  Out of curiosity, have you benchmarked whether the ugi 
cache provides a significant benefit?

 HttpFS should cache filesystems
 ---

 Key: HDFS-3513
 URL: https://issues.apache.org/jira/browse/HDFS-3513
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch


 HttpFS opens and closes a FileSystem instance against the backend filesystem 
 (typically HDFS) on every request. The FileSystem caching is not used as it 
 does not have expiration/timeout and filesystem instances in there live 
 forever, for long running services like HttpFS this is not a good thing as it 
 would keep connections open to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever

2012-07-12 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412863#comment-13412863
 ] 

Uma Maheswara Rao G commented on HDFS-3646:
---

{quote}
There will be caveats, such as the leak will still occur if client code doesn't 
explicitly close all streams.
{quote}
If client code doesn't close the file, dfsClient object should be there and 
lease renewal should happen as file is in open state. At that time keeping the 
reference in LeaseRenewer will not be a leak. Please correct me, if I 
understood your point wrongly.

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412885#comment-13412885
 ] 

Daryn Sharp commented on HDFS-3646:
---

I agree with everything you said if a client code is still holding a reference 
to the stream.  Unfortunately accidents do happen and streams don't always get 
closed.  Since {{DFSClient}} has a hard reference to the stream, the lost 
stream will remain open as long as the client is open.  In turn, the lost 
stream will prevent the lease renewer from removing the client when all other 
streams are closed.

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever

2012-07-12 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412893#comment-13412893
 ] 

Uma Maheswara Rao G commented on HDFS-3646:
---

{quote}
Unfortunately accidents do happen and streams don't always get closed.  Since 
{{DFSClient}} has a hard reference to the stream, the lost stream will remain 
open as long as the client is open. 
{quote}
IMO, this will a leak from application side, since there is a bug in closing 
the streams from app.

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412901#comment-13412901
 ] 

Kihwal Lee commented on HDFS-3646:
--

bq.  the lost stream will remain open as long as the client is open.

I think Daryn is bringing up the issue because its solution also take care of 
this jira. If we have a finializer for FileSystem, we could have it call 
close(), then everything will go away. 

But short of automatic cleaning, this issue still remains. Currently DFSClient 
won't get garbage collected even if lost streams are automatically closed.  I 
think we should still fix it, even if we eventually implement automatic 
clean-up.

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3646) LeaseRenewer can hold reference to inactive DFSClient instances forever


[ 
https://issues.apache.org/jira/browse/HDFS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412916#comment-13412916
 ] 

Daryn Sharp commented on HDFS-3646:
---

bq. IMO, this will a leak from application side, since there is a bug in 
closing the streams from app.

Agreed, but it can have pretty severe consequences that aren't easily detected 
unless explicitly hunting for leaks.  It makes me uneasy that an out of scope 
fs stream can cause a massive leak of heavy objects, threads, and tie up 
sockets that may exhaust fds and/or memory for long running processes.  
Emitting an angry log error for lost  unclosed streams may be more beneficial. 

I don't think a finalizer on the fs will work.  If I do {{in = 
path.getFileSystem(conf).open(...)}}, the fs might get garbage collected but we 
certainly don't want its finalizer to shoot the dfs client that is still 
holding open a stream.  Maybe a finalizer on the dfs client, but in any case, 
the circular hard references need to be broken somehow.

 LeaseRenewer can hold reference to inactive DFSClient instances forever
 ---

 Key: HDFS-3646
 URL: https://issues.apache.org/jira/browse/HDFS-3646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0


 If {{LeaseRenewer#closeClient()}} is not called, {{LeaseRenewer}} keeps the 
 reference to a {{DFSClient}} instance in {{dfsclients}} forever. This 
 prevents {{DFSClient}}, {{LeaseRenewer}}, conf, etc. from being garbage 
 collected, leading to memory leak.
 {{LeaseRenewer}} should remove the reference after some delay, if a 
 {{DFSClient}} instance no longer has active streams.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


 [ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-3644.
---

Resolution: Won't Fix

This jira is not necessary. The conflict code was only in 0.20.203. Post 
upgrade to later releases the conflicting opcode is not used. I am closing than 
as Won't Fix. Reopen if you disagree.

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3645) Improve the way we do detection of a busy DN in the cluster, when choosing it for a block write


[ 
https://issues.apache.org/jira/browse/HDFS-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412929#comment-13412929
 ] 

Suresh Srinivas commented on HDFS-3645:
---

bq. a more realistic measure of if a DN is really busy by itself
Can you elaborate on what this means. Without comparing it with other DNs 
available in the cluster, a local state of DN is incomplete, no?

 Improve the way we do detection of a busy DN in the cluster, when choosing it 
 for a block write
 ---

 Key: HDFS-3645
 URL: https://issues.apache.org/jira/browse/HDFS-3645
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

 Right now, I think we do too naive a computation for detecting if a chosen DN 
 target is busy by itself. We currently do {{node.getXceiverCount()  (2.0 * 
 avgLoad)}}.
 We should improve on this computation with a more realistic measure of if a 
 DN is really busy by itself or not (rather than checking against cluster 
 average, where there's a good chance the value can be wrong to compare with, 
 for some cases)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


[ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412946#comment-13412946
 ] 

Suresh Srinivas commented on HDFS-3644:
---

BTW a comment to relevant to my previous comment - 
https://issues.apache.org/jira/browse/HDFS-1842?focusedCommentId=13021839page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13021839

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-07-12 Thread Hari Mankude (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412969#comment-13412969
 ] 

Hari Mankude commented on HDFS-2802:


A quick user's guide 

hadoop dfsadmin -createsnap snapname path where snap is to be taken ro/rw 
will create a snap with snapname at the location mentioned

hadoop dfsadmin -removesnap snapname will remove snapshot

hadoop dfsadmin -listsnap / will list all snaps that have been taken under / 

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: snap.patch, snapshot-one-pager.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


 [ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-3644:
---


I disagree. There are people running systems with LV -19 which has the 
conflicted opcodes. Currently if you run OEV on these logs, you end up getting 
errors because it reads delegation token ops as eg symlink ops. If we don't 
support OEVing a given LV, we should raise an error.

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3645) Improve the way we do detection of a busy DN in the cluster, when choosing it for a block write

[
https://issues.apache.org/jira/browse/HDFS-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413034#comment-13413034
]

Harsh J commented on HDFS-3645:
---

Hi Suresh,

Thank you, your question got me thinking some more. I filed this JIRA as a
thought dump from some thoughts I was having, going through the policy impl. at
present.

Sorry for lack of clarification. Let me explain the case I imagine may exist
with this specific check:

# node.getXceiverCount() is a total 'socket' count. It includes writes, _and_
reads.
# Consider a cluster situation such as this when computing the average (may
sound a little hypothetical in this explanation but a near enough case is
possible in some situations): 100 DNs are present. Average is about 250 but
there are possibly some (very few) nodes with much higher xceiver counts, at
about 600-800. A likely possibility for such a state is that these nodes are
probably serving a very hot, local-block region (a bad HBase case, but quite
plausible).
# Now consider that this DN wanted to get a block allocated to it. We computed
xceiver average, and found it to be, 250, and then we checked node count, it
was 700. 700 250 leads to it not getting selected, due to us ignoring the
fact that most of the 700 were actually reads and not writes. Perhaps it may
have been OK to do a write in this case, if we knew the ratio of reads:writes
aside of count(reads+writes) on the DN?

I've not seen any major issues with this way of write selection at all, but it
does seem to expose a certain edge case. Do you think we should account for
such a scenario, or let it be as-is and continue to keep the load count
aggregated? If not, let us close this out.

Improve the way we do detection of a busy DN in the cluster, when choosing it
for a block write
---

Key: HDFS-3645
URL: https://issues.apache.org/jira/browse/HDFS-3645
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

Right now, I think we do too naive a computation for detecting if a chosen DN
target is busy by itself. We currently do {{node.getXceiverCount() (2.0 *
avgLoad)}}.
We should improve on this computation with a more realistic measure of if a
DN is really busy by itself or not (rather than checking against cluster
average, where there's a good chance the value can be wrong to compare with,
for some cases)

[jira] [Commented] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


[ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413036#comment-13413036
 ] 

Suresh Srinivas commented on HDFS-3644:
---

Todd, can you tell me which apache release the LV -19 is from. Saves me time, 
since you have already done this analysis.

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


[ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413053#comment-13413053
 ] 

Eli Collins commented on HDFS-3644:
---

Suresh,

{code}
hadoop-branch-1 $ grep -r LAYOUT_VERSIONS_203 src/
src/hdfs/org/apache/hadoop/hdfs/server/common/Storage.java:  public static 
final int[] LAYOUT_VERSIONS_203 = {-19, -31};
{code}

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


[ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413057#comment-13413057
 ] 

Suresh Srinivas commented on HDFS-3644:
---

@Eli, not sure if you saw my previous comment:
bq. The conflict code was only in 0.20.203. Post upgrade to later releases the 
conflicting opcode is not used.

Given that a tool that works with the opcodes seems unnecessary, since the 
problem is only in 0.20.203 alone. Even the editlog code does not handle these 
conflicts in 0.20.204. We make users save namespace to work around it.

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3647) Expose dfs.datanode.max.xcievers as metric

2012-07-12 Thread Steve Hoffman (JIRA)

Steve Hoffman created HDFS-3647:
---

 Summary: Expose dfs.datanode.max.xcievers as metric
 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman


Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there.

There is a lot of mystery surrounding how large to set 
dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
post here (http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
it would be nice if we could expose the current count via the built-in metrics 
framework (most likely under dfs).  In this way we could watch it to see if we 
have it set too high, too low, time to bump it up, etc.

Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3647) Expose current xcievers count as metric

2012-07-12 Thread Steve Hoffman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Hoffman updated HDFS-3647:


Summary: Expose current xcievers count as metric  (was: Expose 
dfs.datanode.max.xcievers as metric)

 Expose current xcievers count as metric
 ---

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman

 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here (http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3563) Fix findbug warnings in raid


[ 
https://issues.apache.org/jira/browse/HDFS-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413065#comment-13413065
 ] 

Eli Collins commented on HDFS-3563:
---

Hi Weiyan,

What's the ETA on this? These are causing jenkins to -1 other changes like 
HDFS-3641 that update the raid code.

Also per Jason's comment on MR-3868 TestRaidNode consistently fails. I filed 
HDFS-3648 for this.

Thanks,
Eli

 Fix findbug warnings in raid
 

 Key: HDFS-3563
 URL: https://issues.apache.org/jira/browse/HDFS-3563
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 3.0.0
Reporter: Jason Lowe
Assignee: Weiyan Wang

 MAPREDUCE-3868 re-enabled raid but introduced 31 new findbugs warnings.  
 Those warnings should be fixed or appropriate items placed in an exclude file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3647) Expose current xcievers count as metric


[ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413069#comment-13413069
 ] 

Harsh J commented on HDFS-3647:
---

I believe I've already done this for Hadoop 0.23+ (Now 2.x), via HDFS-2868. 
Perhaps we can backport that onto 1.x as well, for which we can re-purpose this 
JIRA.

For CDH requests though, this is the wrong place and the right open channel to 
use is https://issues.cloudera.org/browse/DISTRO or its mailing lists.

 Expose current xcievers count as metric
 ---

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman

 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3641) Move server Util time methods to common and use now instead of System#currentTimeMillis


[ 
https://issues.apache.org/jira/browse/HDFS-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413068#comment-13413068
 ] 

Eli Collins commented on HDFS-3641:
---

findbugs failures are from hdfs-raid, HDFS-3563 tracks those. Per 
MAPREDUCE-3868 this has been failing since hdfs-raid was re-introduced. I filed 
HDFS-3648 to track this.

 Move server Util time methods to common and use now instead of 
 System#currentTimeMillis
 ---

 Key: HDFS-3641
 URL: https://issues.apache.org/jira/browse/HDFS-3641
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hdfs-3641.txt


 To help HDFS-3640, let's move the time methods from the HDFS server Util 
 class to common and use now instead of System#currentTimeMillis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS


 [ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumadhur Reddy Bolli updated HDFS-385:
--

Attachment: blockplacementpolicy-branch-1-win.patch
blockplacementpolicy-branch-1.patch

Patches to port the pluggable interface to branch-1 and branch-1-win. 

 Design a pluggable interface to place replicas of blocks in HDFS
 

 Key: HDFS-385
 URL: https://issues.apache.org/jira/browse/HDFS-385
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: BlockPlacementPluggable.txt, 
 BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
 BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
 BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, 
 BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, 
 blockplacementpolicy-branch-1.patch


 The current HDFS code typically places one replica on local rack, the second 
 replica on remote random rack and the third replica on a random node of that 
 remote rack. This algorithm is baked in the NameNode's code. It would be nice 
 to make the block placement algorithm a pluggable interface. This will allow 
 experimentation of different placement algorithms based on workloads, 
 availability guarantees and failure models.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3647) Expose current xcievers count as metric

2012-07-12 Thread Steve Hoffman (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413075#comment-13413075
]

Steve Hoffman commented on HDFS-3647:
-

https://issues.cloudera.org/browse/DISTRO-414 opened with cloudera. Thx.

I'll leave it to you guys if you want to use this to track a 1.X apache back
port.

Expose current xcievers count as metric
---

Key: HDFS-3647
URL: https://issues.apache.org/jira/browse/HDFS-3647
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman

Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't
there.
There is a lot of mystery surrounding how large to set
dfs.datanode.max.xcievers. Most people say to just up it to 4096, but given
that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog
post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html),
it would be nice if we could expose the current count via the built-in
metrics framework (most likely under dfs). In this way we could watch it to
see if we have it set too high, too low, time to bump it up, etc.
Thoughts?

[jira] [Commented] (HDFS-3641) Move server Util time methods to common and use now instead of System#currentTimeMillis


[ 
https://issues.apache.org/jira/browse/HDFS-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413095#comment-13413095
 ] 

Hudson commented on HDFS-3641:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2523 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2523/])
HDFS-3641. Move server Util time methods to common and use now instead of 
System#currentTimeMillis. Contributed by Eli Collins (Revision 1360858)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360858
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsRecordBuilderImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketIOWithTimeout.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/AsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ReflectionUtils.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ThreadUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Time.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/loadGenerator/LoadGenerator.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/InMemoryNativeFileSystemStore.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestHealthMonitor.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverControllerStress.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/TestTFileSeqFileComparison.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/Timer.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/MiniRPCBenchmark.java
*

[jira] [Commented] (HDFS-3641) Move server Util time methods to common and use now instead of System#currentTimeMillis


[ 
https://issues.apache.org/jira/browse/HDFS-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413097#comment-13413097
 ] 

Hudson commented on HDFS-3641:
--

Integrated in Hadoop-Common-trunk-Commit #2457 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2457/])
HDFS-3641. Move server Util time methods to common and use now instead of 
System#currentTimeMillis. Contributed by Eli Collins (Revision 1360858)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360858
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsRecordBuilderImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketIOWithTimeout.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/AsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ReflectionUtils.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ThreadUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Time.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/loadGenerator/LoadGenerator.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/InMemoryNativeFileSystemStore.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestHealthMonitor.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverControllerStress.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/TestTFileSeqFileComparison.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/Timer.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/MiniRPCBenchmark.java
*

[jira] [Resolved] (HDFS-3648) TestRaidNode.testDistRaid fails

2012-07-12 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved HDFS-3648.
--

Resolution: Duplicate

 TestRaidNode.testDistRaid fails
 ---

 Key: HDFS-3648
 URL: https://issues.apache.org/jira/browse/HDFS-3648
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Eli Collins

 Per MAPREDUCE-3868 TestRaidNode fails consistently, here's a recent example 
 from HDFS-3641.
 Error Message
 expected:0 but was:2
 Stacktrace
 junit.framework.AssertionFailedError: expected:0 but was:2
   at junit.framework.Assert.fail(Assert.java:47)
   at junit.framework.Assert.failNotEquals(Assert.java:283)
   at junit.framework.Assert.assertEquals(Assert.java:64)
   at junit.framework.Assert.assertEquals(Assert.java:130)
   at junit.framework.Assert.assertEquals(Assert.java:136)
   at 
 org.apache.hadoop.raid.TestRaidNode.testDistRaid(TestRaidNode.java:583)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3641) Move server Util time methods to common and use now instead of System#currentTimeMillis


[ 
https://issues.apache.org/jira/browse/HDFS-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413137#comment-13413137
 ] 

Hudson commented on HDFS-3641:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2476 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2476/])
HDFS-3641. Move server Util time methods to common and use now instead of 
System#currentTimeMillis. Contributed by Eli Collins (Revision 1360858)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360858
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsRecordBuilderImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSinkAdapter.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketIOWithTimeout.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/AsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ReflectionUtils.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ThreadUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Time.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/loadGenerator/LoadGenerator.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/InMemoryNativeFileSystemStore.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestHealthMonitor.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverControllerStress.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/TestTFileSeqFileComparison.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/file/tfile/Timer.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java
*

[jira] [Created] (HDFS-3649) Port HDFS-385 to branch-1-win

Sumadhur Reddy Bolli created HDFS-3649:
--

 Summary: Port HDFS-385 to branch-1-win
 Key: HDFS-3649
 URL: https://issues.apache.org/jira/browse/HDFS-3649
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1-win
Reporter: Sumadhur Reddy Bolli


Added patch to HDF-385 to port the existing pluggable placement policy to 
branch-1-win

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3564) Make the replication policy pluggable to allow custom replication policies

[
https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-3564:
--

Target Version/s: (was: 1.1.0)

Hi Sumadhur,

I'm unsetting the target version from 1.1.0 since that release is already under
way. Btw branch-1 is our sustaining branch, will need to be sure to make sure
this is compatible / well tested.

Make the replication policy pluggable to allow custom replication policies
--

Key: HDFS-3564
URL: https://issues.apache.org/jira/browse/HDFS-3564
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: Sumadhur Reddy Bolli
Original Estimate: 24h
Remaining Estimate: 24h

ReplicationTargetChooser currently determines the placement of replicas in
hadoop. Making the replication policy pluggable would help in having custom
replication policies that suit the environment.
Eg1: Enabling placing replicas across different datacenters(not just racks)
Eg2: Enabling placing replicas across multiple(more than 2) racks
Eg3: Cloud environments like azure have logical concepts like fault and
upgrade domains. Each fault domain spans multiple upgrade domains and each
upgrade domain spans multiple fault domains. Machines are spread typically
evenly across both fault and upgrade domains. Fault domain failures are
typically catastrophic/unplanned failures and data loss possibility is high.
An upgrade domain can be taken down by azure for maintenance periodically.
Each time an upgrade domain is taken down a small percentage of machines in
the upgrade domain(typically 1-2%) are replaced due to disk failures, thus
losing data. Assuming the default replication factor 3, any 3 data nodes
going down at the same time would mean potential data loss. So, it is
important to have a policy that spreads replicas across both fault and
upgrade domains to ensure practically no data loss. The problem here is two
dimensional and the default policy in hadoop is one-dimensional. Custom
policies to address issues like these can be written if we make the policy
pluggable.

[jira] [Updated] (HDFS-3647) Backport HDFS-2868 (Add number of active transfer threads to the DataNode status) to branch-1


 [ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3647:
--

Target Version/s: 1.2.0
 Summary: Backport HDFS-2868 (Add number of active transfer threads 
to the DataNode status) to branch-1  (was: Expose current xcievers count as 
metric)

 Backport HDFS-2868 (Add number of active transfer threads to the DataNode 
 status) to branch-1
 -

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman

 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3566) Custom Replication Policy for Azure

[
https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sumadhur Reddy Bolli updated HDFS-3566:
---

Target Version/s: 1-win (was: 1.1.0)

Custom Replication Policy for Azure
---

Key: HDFS-3566
URL: https://issues.apache.org/jira/browse/HDFS-3566
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: Sumadhur Reddy Bolli

Azure has logical concepts like fault and upgrade domains. Each fault domain
spans multiple upgrade domains and each upgrade domain spans multiple fault
domains. Machines are spread typically evenly across both fault and upgrade
domains. Fault domain failures are typically catastrophic/unplanned failures
and data loss possibility is high. An upgrade domain can be taken down by
azure for maintenance periodically. Each time an upgrade domain is taken down
a small percentage of machines in the upgrade domain(typically 1-2%) are
replaced due to disk failures, thus losing data. Assuming the default
replication factor 3, any 3 data nodes going down at the same time would mean
potential data loss. So, it is important to have a policy that spreads
replicas across both fault and upgrade domains to ensure practically no data
loss. The problem here is two dimensional and the default policy in hadoop is
one-dimensional. This policy would spread the datanodes across atleast 2
fault domains and three upgrade domains to prevent data loss.

[jira] [Commented] (HDFS-3564) Make the replication policy pluggable to allow custom replication policies

[
https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413155#comment-13413155
]

Harsh J commented on HDFS-3564:
---

bq. I will re-purpose this JIRA to suggest enhancements to the existing
abstraction.

Given that HDFS-3649 was just opened for backport work, can you at least
re-title the JIRA to fit this re-purpose goal? Avoids confusion for some of us.
Thanks! :)

Make the replication policy pluggable to allow custom replication policies
--

[jira] [Assigned] (HDFS-3647) Backport HDFS-2868 (Add number of active transfer threads to the DataNode status) to branch-1


 [ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HDFS-3647:
-

Assignee: Harsh J

 Backport HDFS-2868 (Add number of active transfer threads to the DataNode 
 status) to branch-1
 -

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman
Assignee: Harsh J

 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3644) OEV should recognize and deal with 0.20.20x opcode versions


[ 
https://issues.apache.org/jira/browse/HDFS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413179#comment-13413179
 ] 

Suresh Srinivas commented on HDFS-3644:
---

bq. That's the same proposal here
Sorry, that was not clear to me by the title or description. Perhaps we could 
change them for better clarity.

 OEV should recognize and deal with 0.20.20x opcode versions
 ---

 Key: HDFS-3644
 URL: https://issues.apache.org/jira/browse/HDFS-3644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor

 We have some opcode conflicts for edit logs between 0.20.20x (LV -19, -31) vs 
 newer versions. For edit log loading, we dealt with this by forcing users to 
 save namespace on an earlier version before upgrading. But, using a trunk OEV 
 on an older version is useful since the OEV has had so many improvements. It 
 would be nice to be able to specify a flag to the OEV to be able to run on 
 older edit logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy


[ 
https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413183#comment-13413183
 ] 

Sumadhur Reddy Bolli commented on HDFS-3564:


I apologize for the incovenience. Changed the title. I will update the 
description or attach a doc with the proposed changes once the 3649 port is 
complete. Thanks!

 Design enhancements to the pluggable blockplacementpolicy
 -

 Key: HDFS-3564
 URL: https://issues.apache.org/jira/browse/HDFS-3564
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Sumadhur Reddy Bolli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy


[ 
https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413184#comment-13413184
 ] 

Suresh Srinivas commented on HDFS-3564:
---

bq. will need to be sure to make sure this is compatible / well tested
Eli, not sure about compatible requirements. I think block placement policy was 
made InterfaceAudience.Private sometime back. It referred to internal classes 
that were not public. That said, I agree, any enhancement should try to 
preserve the compatibility.

 Design enhancements to the pluggable blockplacementpolicy
 -

 Key: HDFS-3564
 URL: https://issues.apache.org/jira/browse/HDFS-3564
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Sumadhur Reddy Bolli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2465) Add HDFS support for fadvise readahead and drop-behind


[ 
https://issues.apache.org/jira/browse/HDFS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413190#comment-13413190
 ] 

Suresh Srinivas commented on HDFS-2465:
---

This time, going from +! to +1 :-)


 Add HDFS support for fadvise readahead and drop-behind
 --

 Key: HDFS-2465
 URL: https://issues.apache.org/jira/browse/HDFS-2465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.23.0

 Attachments: HDFS-2465.branch-1.patch, hdfs-2465.txt, hdfs-2465.txt, 
 hdfs-2465.txt, hdfs-2465.txt


 This is the HDFS side of HADOOP-7714. The initial implementation is heuristic 
 based and should be considered experimental, as discussed in the parent JIRA. 
 It should be off by default until better heuristics, APIs, and tuning 
 experience is developed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2465) Add HDFS support for fadvise readahead and drop-behind


 [ 
https://issues.apache.org/jira/browse/HDFS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2465:
--

Fix Version/s: 1.2.0

I committed the patch to branch-1

 Add HDFS support for fadvise readahead and drop-behind
 --

 Key: HDFS-2465
 URL: https://issues.apache.org/jira/browse/HDFS-2465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.23.0, 1.2.0

 Attachments: HDFS-2465.branch-1.patch, hdfs-2465.txt, hdfs-2465.txt, 
 hdfs-2465.txt, hdfs-2465.txt


 This is the HDFS side of HADOOP-7714. The initial implementation is heuristic 
 based and should be considered experimental, as discussed in the parent JIRA. 
 It should be off by default until better heuristics, APIs, and tuning 
 experience is developed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3583) Convert remaining tests to Junit4

2012-07-12 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-3583:
--

Attachment: hdfs-3583.patch

So I think this is getting about done. 

The findbugs warnings are all in hdfs-raid, which I didn't touch. The huge diff 
is because the last Jenkins job didn't findbugs hdfs-raid.

I blacklisted TestNameNodeMXBean, which should fix it. TestBackupNode and 
TestRaidNode failed for me on trunk. TestDirectoryScanner worked for me locally.

I also manually verified that the # of tests between the last Jenkins run and 
another recent PreCommit job was the same.

 Convert remaining tests to Junit4
 -

 Key: HDFS-3583
 URL: https://issues.apache.org/jira/browse/HDFS-3583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Andrew Wang
  Labels: newbie
 Attachments: hdfs-3583.patch, hdfs-3583.patch, junit3to4.sh


 JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's 
 convert the remaining tests over to Junit4 style.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3645) Improve the way we do detection of a busy DN in the cluster, when choosing it for a block write

[
https://issues.apache.org/jira/browse/HDFS-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413251#comment-13413251
]

Suresh Srinivas commented on HDFS-3645:
---

I was just trying to understand the proposal, especially the part where you say
rather than checking against cluster average.

Current code is trying to distribute the load among datanodes. It considers
both reads and writes as the same cost to datanodes. Perhaps this is not good
enough and may need further improvements.

Given block placement is pluggable, other policies could be tried out. Given
that I am not sure if the jira or the title makes sense. In order to try out
other policies, one may also add more granular stats - such as number of reader
and writers, number of readers or writers per disk etc. Given that I am not
sure if the title or the description is clear enough. But we could keep the
jira around for such discussions.

Improve the way we do detection of a busy DN in the cluster, when choosing it
for a block write
---

Key: HDFS-3645
URL: https://issues.apache.org/jira/browse/HDFS-3645
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

[jira] [Updated] (HDFS-3610) fuse_dfs: Provide a way to use the default (configured) NN URI


 [ 
https://issues.apache.org/jira/browse/HDFS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3610:
---

Attachment: HDFS-3610.001.patch

This patch depends on HDFS-3609.  It makes it possible to mount arbitrary URI 
strings in fuse_dfs.

All the existing arguments that worked previously are still supported.  
Notably, the quirky 'dfs://' as a synonym for 'hdfs://' behavior is still 
preserved, and you can specify port via -o server=hdfs://hostname:port or -o 
server=hdfs://hostname -oport=port

 fuse_dfs: Provide a way to use the default (configured) NN URI
 --

 Key: HDFS-3610
 URL: https://issues.apache.org/jira/browse/HDFS-3610
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3610.001.patch


 It shouldn't be necessary to explictly spell out the NameNode you want to 
 connect to when launching fuse_dfs.  libhdfs can read the configuration files 
 and use the default URI.  However, we don't have a command-line option for 
 this in fuse_dfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3610) fuse_dfs: Provide a way to use the default (configured) NN URI


 [ 
https://issues.apache.org/jira/browse/HDFS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3610:
---

Status: Patch Available  (was: Open)

 fuse_dfs: Provide a way to use the default (configured) NN URI
 --

 Key: HDFS-3610
 URL: https://issues.apache.org/jira/browse/HDFS-3610
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3610.001.patch


 It shouldn't be necessary to explictly spell out the NameNode you want to 
 connect to when launching fuse_dfs.  libhdfs can read the configuration files 
 and use the default URI.  However, we don't have a command-line option for 
 this in fuse_dfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3650) Use MutableQuantiles to provide latency histograms for various operations

2012-07-12 Thread Andrew Wang (JIRA)

Andrew Wang created HDFS-3650:
-

 Summary: Use MutableQuantiles to provide latency histograms for 
various operations
 Key: HDFS-3650
 URL: https://issues.apache.org/jira/browse/HDFS-3650
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.0.1-alpha


MutableQuantiles provide accurate estimation of various percentiles for a 
stream of data. Many existing metrics reported by a MutableRate would also 
benefit from having these percentiles; lets add MutableQuantiles where we think 
it'd be useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3630) Modify TestPersistBlocks to use both flush and hflush

2012-07-12 Thread Sanjay Radia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-3630:
---

  Resolution: Fixed
Target Version/s: 3.0.0
  Status: Resolved  (was: Patch Available)

 Modify TestPersistBlocks to use both flush and hflush
 -

 Key: HDFS-3630
 URL: https://issues.apache.org/jira/browse/HDFS-3630
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Attachments: hdfs3630.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3650) Use MutableQuantiles to provide latency histograms for various operations

2012-07-12 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3650:
-

Target Version/s: 2.0.1-alpha
   Fix Version/s: (was: 2.0.1-alpha)

Setting the target version instead of the fix version. Please only set the 
fix version once it's been committed.

 Use MutableQuantiles to provide latency histograms for various operations
 -

 Key: HDFS-3650
 URL: https://issues.apache.org/jira/browse/HDFS-3650
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang

 MutableQuantiles provide accurate estimation of various percentiles for a 
 stream of data. Many existing metrics reported by a MutableRate would also 
 benefit from having these percentiles; lets add MutableQuantiles where we 
 think it'd be useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3651) optionally, the NameNode should invoke saveNamespace after getting a SIGTERM

Colin Patrick McCabe created HDFS-3651:
--

 Summary: optionally, the NameNode should invoke saveNamespace 
after getting a SIGTERM
 Key: HDFS-3651
 URL: https://issues.apache.org/jira/browse/HDFS-3651
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor


It would be nice if the NameNode could be configured so that it did a 
saveNamespace and then shut down cleanly after receiving a SIGTERM signal.  In 
general, it is a good practice to call saveNamespace when doing an orderly 
shutdown, to ensure that all of the information in the namespace is on disk.  
Of course, this should not be necessary if the SecondaryNameNode or 
StandbyNameNode is operating correctly.  However, when there are bugs in these 
daemons, a saveNamespace can prevent disaster.

Currently, we don't catch SIGTERM, but just shut down immediately, without 
doing any cleanup.  Of course, it will always be possible to shut down the 
NameNode without doing a saveNamespace, simply by sending SIGKILL, which is 
un-catchable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3630) Modify TestPersistBlocks to use both flush and hflush


[ 
https://issues.apache.org/jira/browse/HDFS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413313#comment-13413313
 ] 

Hudson commented on HDFS-3630:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2524 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2524/])
HDFS-3630 Modify TestPersistBlocks to use both flush and hflush  (sanjay) 
(Revision 1360991)

 Result = SUCCESS
sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360991
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPersistBlocks.java


 Modify TestPersistBlocks to use both flush and hflush
 -

 Key: HDFS-3630
 URL: https://issues.apache.org/jira/browse/HDFS-3630
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Attachments: hdfs3630.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3630) Modify TestPersistBlocks to use both flush and hflush


[ 
https://issues.apache.org/jira/browse/HDFS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413315#comment-13413315
 ] 

Hudson commented on HDFS-3630:
--

Integrated in Hadoop-Common-trunk-Commit #2458 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2458/])
HDFS-3630 Modify TestPersistBlocks to use both flush and hflush  (sanjay) 
(Revision 1360991)

 Result = SUCCESS
sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360991
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPersistBlocks.java


 Modify TestPersistBlocks to use both flush and hflush
 -

 Key: HDFS-3630
 URL: https://issues.apache.org/jira/browse/HDFS-3630
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Attachments: hdfs3630.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

Todd Lipcon created HDFS-3652:
-

 Summary: 1.x: FSEditLog failure removes the wrong edit stream when 
storage dirs have same name
 Key: HDFS-3652
 URL: https://issues.apache.org/jira/browse/HDFS-3652
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.3, 1.1.0, 1.2.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
trying to find the stream corresponding to a given dir. To check equality, we 
currently use the following condition:
{code}
  File parentDir = getStorageDirForStream(idx);
  if (parentDir.getName().equals(sd.getRoot().getName())) {
{code}
... which is horribly incorrect. If two or more storage dirs happen to have the 
same terminal path component (eg /data/1/nn and /data/2/nn) then it will pick 
the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

2012-07-12 Thread Matt Foley (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413320#comment-13413320
 ] 

Matt Foley commented on HDFS-3652:
--

Urk!  Quite a catch.  When patch available, please commit to branch-1.0 as well 
as branch-1.1 and branch-1.

 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
 same name
 -

 Key: HDFS-3652
 URL: https://issues.apache.org/jira/browse/HDFS-3652
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.3, 1.1.0, 1.2.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker

 In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
 trying to find the stream corresponding to a given dir. To check equality, we 
 currently use the following condition:
 {code}
   File parentDir = getStorageDirForStream(idx);
   if (parentDir.getName().equals(sd.getRoot().getName())) {
 {code}
 ... which is horribly incorrect. If two or more storage dirs happen to have 
 the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
 pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name


[ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413326#comment-13413326
 ] 

Todd Lipcon commented on HDFS-3652:
---

This has data-loss implications as well. I am able to reproduce the following:

- NN is writing to three dirs: /data/1/nn, /data/2/nn, and /data/3/nn
- I modified the NN to inject an IOException when creating edits.new in 
/data/3/nn, which causes removeEditsForStorageDir to get called inside 
{{rollEditLog}}
- Upon triggering a checkpoint:
-- all three logs are closed successfully
-- /data/1/nn and /data/2/nn are successfully opened for edits.new
-- /data/3/nn throws an IOE which gets caught. This calls 
{{removeEditsForStorageDir}}, which removes the wrong stream (augmented 
logging):
{code}
12/07/12 16:23:54 INFO namenode.FSNamesystem: Roll Edit Log from 127.0.0.1
12/07/12 16:23:54 INFO namenode.FSNamesystem: Number of transactions: 0 Total 
time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number 
of syncs: 0 SyncTimes(ms): 0 0 0 
12/07/12 16:23:54 WARN namenode.FSNamesystem: Removing edits stream 
/tmp/name1/nn/current/edits.new
12/07/12 16:23:54 WARN common.Storage: Removing storage dir /tmp/name3/nn
java.io.IOException: Injected fault for /tmp/name3/nn/current/edits.new
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.init(FSEditLog.java:146)
{code}
- The NN is now _only_ writing to /tmp/name2/nn/current/edits.new, but 
considers both name1 and name2 to be good from a storage-directory standpoint. 
However, {{/tmp/name1/nn/current/edits.new}} exists as an empty edit log file 
(just the header and preallocated region of 0xffs)
- When {{rollFSImage}} is called, it successfully calls {{close}} only on the 
name2 log - which truncates it to the correct transaction boundary. Then it 
renames both {{name2/.../edits.new}} and {{name1/.../edits.new}} to {{edits}}, 
and opens them both for append (assuming they've been truncated to a 
transaction boundary).
- The NN is now writing to name1 and name2, but name1's log looks like this:

{code}
valid header preallocated bytes of 0x. transactions
{code}

- Upon the next checkpoint, the 2NN will likely download this log, since it's 
listed first in the name directory list. Upon doing so, it will see the 0xff at 
the head of the log and not read any of the edits (which come after all of the 
0xffs)
- The 2NN then uploads the merged image back to the NN, which blows away the 
edits file. Thus, its in-memory data has gotten out of sync with the disk 
data, and the next time a checkpoint occurs or the NN restarts, it will fail.

This is not an issue in trunk since the code was largely rewritten by HDFS-1073.

The workaround for existing users is simple: rename the directories to eg 
/data/1/nn1 and /data/2/nn2. The fix is also simple. I will upload the fix this 
afternoon.

 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
 same name
 -

 Key: HDFS-3652
 URL: https://issues.apache.org/jira/browse/HDFS-3652
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.3, 1.1.0, 1.2.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker

 In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
 trying to find the stream corresponding to a given dir. To check equality, we 
 currently use the following condition:
 {code}
   File parentDir = getStorageDirForStream(idx);
   if (parentDir.getName().equals(sd.getRoot().getName())) {
 {code}
 ... which is horribly incorrect. If two or more storage dirs happen to have 
 the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
 pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3610) fuse_dfs: Provide a way to use the default (configured) NN URI

[
https://issues.apache.org/jira/browse/HDFS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413331#comment-13413331
]

Hadoop QA commented on HDFS-3610:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536293/HDFS-3610.001.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestBackupNode
org.apache.hadoop.hdfs.server.common.TestJspHelper

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2808//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2808//console

This message is automatically generated.

fuse_dfs: Provide a way to use the default (configured) NN URI
--

Key: HDFS-3610
URL: https://issues.apache.org/jira/browse/HDFS-3610
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Attachments: HDFS-3610.001.patch

It shouldn't be necessary to explictly spell out the NameNode you want to
connect to when launching fuse_dfs. libhdfs can read the configuration files
and use the default URI. However, we don't have a command-line option for
this in fuse_dfs.

[jira] [Commented] (HDFS-3630) Modify TestPersistBlocks to use both flush and hflush


[ 
https://issues.apache.org/jira/browse/HDFS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1341#comment-1341
 ] 

Hudson commented on HDFS-3630:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2477 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2477/])
HDFS-3630 Modify TestPersistBlocks to use both flush and hflush  (sanjay) 
(Revision 1360991)

 Result = FAILURE
sradia : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1360991
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPersistBlocks.java


 Modify TestPersistBlocks to use both flush and hflush
 -

 Key: HDFS-3630
 URL: https://issues.apache.org/jira/browse/HDFS-3630
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Attachments: hdfs3630.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3609) libhdfs: don't force the URI to look like hdfs://hostname:port


[ 
https://issues.apache.org/jira/browse/HDFS-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413334#comment-13413334
 ] 

Eli Collins commented on HDFS-3609:
---

Patch looks good

Nit: what you're calling URI prefix/location/protocol type in the 
comments is called the scheme in URI lingo.

Testing?  Eg confirmed you can run libhdfs against an HA config (ie one w/o a 
port) now?

The test failures here are obviously unrelated.



 libhdfs: don't force the URI to look like hdfs://hostname:port
 --

 Key: HDFS-3609
 URL: https://issues.apache.org/jira/browse/HDFS-3609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3609.001.patch


 Currently, libhdfs forces the URI to look like hdfs://hostname:port.
 For configurations like HA or federation this is not ideal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3633) libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE


 [ 
https://issues.apache.org/jira/browse/HDFS-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3633:
--

  Resolution: Fixed
   Fix Version/s: 2.0.1-alpha
Target Version/s:   (was: 2.0.1-alpha)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)


+1  findbugs is unrelated.  I've committed this, thanks Colin. 
 

 libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE
 -

 Key: HDFS-3633
 URL: https://issues.apache.org/jira/browse/HDFS-3633
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3633.001.patch


 In libhdfs in hdfsDelete, the header file says any non-zero argument to 
 hdfsDelete will be interpreted as true.  However, the hdfsDelete function 
 does not translate these non-zero values to JNI_FALSE and JNI_TRUE, 
 potentially leading to undefined or JVM-specific behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name


 [ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3652:
--

Attachment: hdfs-3652.txt

Attached patch is for branch-1.

I modified the existing storage dir failure test so that all of the name dirs 
have the same name, and it started to fail. After fixing the bug, it passes.

 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
 same name
 -

 Key: HDFS-3652
 URL: https://issues.apache.org/jira/browse/HDFS-3652
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.3, 1.1.0, 1.2.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Attachments: hdfs-3652.txt


 In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
 trying to find the stream corresponding to a given dir. To check equality, we 
 currently use the following condition:
 {code}
   File parentDir = getStorageDirForStream(idx);
   if (parentDir.getName().equals(sd.getRoot().getName())) {
 {code}
 ... which is horribly incorrect. If two or more storage dirs happen to have 
 the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
 pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

2012-07-12 Thread Aaron T. Myers (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413346#comment-13413346
 ] 

Aaron T. Myers commented on HDFS-3652:
--

+1, the patch looks good to me. Great find/fix, Todd.

 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
 same name
 -

 Key: HDFS-3652
 URL: https://issues.apache.org/jira/browse/HDFS-3652
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.3, 1.1.0, 1.2.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Attachments: hdfs-3652.txt


 In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
 trying to find the stream corresponding to a given dir. To check equality, we 
 currently use the following condition:
 {code}
   File parentDir = getStorageDirForStream(idx);
   if (parentDir.getName().equals(sd.getRoot().getName())) {
 {code}
 ... which is horribly incorrect. If two or more storage dirs happen to have 
 the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
 pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed


[ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413350#comment-13413350
 ] 

Eli Collins commented on HDFS-799:
--

+1   (test failure is unrelated).

 libhdfs must call DetachCurrentThread when a thread is destroyed
 

 Key: HDFS-799
 URL: https://issues.apache.org/jira/browse/HDFS-799
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Christian Kunz
Assignee: Colin Patrick McCabe
 Attachments: HDFS-799.001.patch, HDFS-799.003.patch, 
 HDFS-799.004.patch, HDFS-799.005.patch


 Threads that call AttachCurrentThread in libhdfs and disappear without 
 calling DetachCurrentThread cause a memory leak.
 Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed


 [ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-799:
-

  Resolution: Fixed
   Fix Version/s: 2.0.1-alpha
Target Version/s:   (was: 2.0.1-alpha)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've committed this and merged to branch-2. Thanks Colin.

 libhdfs must call DetachCurrentThread when a thread is destroyed
 

 Key: HDFS-799
 URL: https://issues.apache.org/jira/browse/HDFS-799
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Christian Kunz
Assignee: Colin Patrick McCabe
 Fix For: 2.0.1-alpha

 Attachments: HDFS-799.001.patch, HDFS-799.003.patch, 
 HDFS-799.004.patch, HDFS-799.005.patch


 Threads that call AttachCurrentThread in libhdfs and disappear without 
 calling DetachCurrentThread cause a memory leak.
 Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3492) fix some misuses of InputStream#skip

[
https://issues.apache.org/jira/browse/HDFS-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-3492:
--

Attachment: hdfs-3492.txt

Patch rebased on trunk.

fix some misuses of InputStream#skip

Key: HDFS-3492
URL: https://issues.apache.org/jira/browse/HDFS-3492
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Attachments: HDFS-3492.001.patch, HDFS-3492.002.patch, hdfs-3492.txt

It seems that we have a few cases where programmers are calling
InputStream#skip and not handling short skips. Unfortunately, the skip
method is documented and implemented so that it doesn't actually skip the
requested number of bytes, but simply tries to skip at most that amount of
bytes. A better name probably would have been trySkip or similar.
It seems like most of the time when the argument to skip is small enough,
we'll succeed almost all of the time. This is no doubt an implementation
artifact of some of the popular stream implementations. This tends to hide
the bug-- however, it is still waiting to emerge at some point if those
implementations ever change or if buffer sizes are adjusted, etc.
All of these cases can be fixed by calling IOUtils#skipFully to get the
behavior that the programmer expects-- i.e., skipping by the specified amount.

[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4

[
https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413358#comment-13413358
]

Hadoop QA commented on HDFS-3583:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12536289/hdfs-3583.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 259 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 31 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-httpfs
hadoop-hdfs-project/hadoop-hdfs-raid:

org.apache.hadoop.hdfs.server.namenode.TestBackupNode
org.apache.hadoop.hdfs.server.common.TestJspHelper
org.apache.hadoop.raid.TestRaidNode

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2807//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/2807//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-raid.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2807//console

This message is automatically generated.

Convert remaining tests to Junit4
-

Key: HDFS-3583
URL: https://issues.apache.org/jira/browse/HDFS-3583
Project: Hadoop HDFS
Issue Type: Improvement
Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Andrew Wang
Labels: newbie
Attachments: hdfs-3583.patch, hdfs-3583.patch, junit3to4.sh

JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's
convert the remaining tests over to Junit4 style.

[jira] [Created] (HDFS-3653) 1.x: Add a retention period for purged edit logs

Todd Lipcon created HDFS-3653:
-

 Summary: 1.x: Add a retention period for purged edit logs
 Key: HDFS-3653
 URL: https://issues.apache.org/jira/browse/HDFS-3653
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 1.1.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Occasionally we have a bug which causes something to go wrong with edits files. 
Even more occasionally the bug is such that the namenode mistakenly deletes an 
{{edits}} file without merging it into {{fsimage}} properly -- e.g if the bug 
mistakenly writes an OP_INVALID at the top of the log.

In trunk/2.0 we retain many edit log segments going back in time to be more 
robust to this kind of error. I'd like to implement something similar (but much 
simpler) in 1.x, which would be used only by HDFS developers in root-causing or 
repairing from these rare scenarios: the NN should never directly delete an 
edit log file. Instead, it should rename the file into some kind of trash 
directory inside the name dir, and associate it with a timestamp. Then, 
periodically a separate thread should scan the trash dirs and delete any logs 
older than a configurable time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3606) libhdfs: create self-contained unit test


[ 
https://issues.apache.org/jira/browse/HDFS-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413359#comment-13413359
 ] 

Eli Collins commented on HDFS-3606:
---

The following should be updated now that HDFS-3633 is in right?

{code}
// TODO: Non-recursive delete should fail?
//EXPECT_NONZERO(hdfsDelete(fs, prefix, 0));
{code}

Otherwise looks great. Agree with all of Andy's (excellent) feedback.

 libhdfs: create self-contained unit test
 

 Key: HDFS-3606
 URL: https://issues.apache.org/jira/browse/HDFS-3606
 Project: Hadoop HDFS
  Issue Type: Test
  Components: libhdfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3606.001.patch, HDFS-3606.003.patch, 
 HDFS-3606.004.patch


 We should have a self-contained unit test for libhdfs and also for FUSE.
 We do have hdfs_test, but it is not self-contained (it requires a cluster to 
 already be running before it can be used.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3633) libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE


[ 
https://issues.apache.org/jira/browse/HDFS-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413363#comment-13413363
 ] 

Hudson commented on HDFS-3633:
--

Integrated in Hadoop-Common-trunk-Commit #2459 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2459/])
HDFS-3633. libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE. 
Contributed by Colin Patrick McCabe (Revision 1361005)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1361005
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c


 libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE
 -

 Key: HDFS-3633
 URL: https://issues.apache.org/jira/browse/HDFS-3633
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3633.001.patch


 In libhdfs in hdfsDelete, the header file says any non-zero argument to 
 hdfsDelete will be interpreted as true.  However, the hdfsDelete function 
 does not translate these non-zero values to JNI_FALSE and JNI_TRUE, 
 potentially leading to undefined or JVM-specific behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed


[ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413362#comment-13413362
 ] 

Hudson commented on HDFS-799:
-

Integrated in Hadoop-Common-trunk-Commit #2459 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2459/])
HDFS-799. libhdfs must call DetachCurrentThread when a thread is destroyed. 
Contributed by Colin Patrick McCabe (Revision 1361008)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1361008
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/config.h.cmake
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfsJniHelper.c


 libhdfs must call DetachCurrentThread when a thread is destroyed
 

 Key: HDFS-799
 URL: https://issues.apache.org/jira/browse/HDFS-799
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Christian Kunz
Assignee: Colin Patrick McCabe
 Fix For: 2.0.1-alpha

 Attachments: HDFS-799.001.patch, HDFS-799.003.patch, 
 HDFS-799.004.patch, HDFS-799.005.patch


 Threads that call AttachCurrentThread in libhdfs and disappear without 
 calling DetachCurrentThread cause a memory leak.
 Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3633) libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE


[ 
https://issues.apache.org/jira/browse/HDFS-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413367#comment-13413367
 ] 

Hudson commented on HDFS-3633:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2525 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2525/])
HDFS-3633. libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE. 
Contributed by Colin Patrick McCabe (Revision 1361005)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1361005
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c


 libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE
 -

 Key: HDFS-3633
 URL: https://issues.apache.org/jira/browse/HDFS-3633
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3633.001.patch


 In libhdfs in hdfsDelete, the header file says any non-zero argument to 
 hdfsDelete will be interpreted as true.  However, the hdfsDelete function 
 does not translate these non-zero values to JNI_FALSE and JNI_TRUE, 
 potentially leading to undefined or JVM-specific behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed


[ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413366#comment-13413366
 ] 

Hudson commented on HDFS-799:
-

Integrated in Hadoop-Hdfs-trunk-Commit #2525 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2525/])
HDFS-799. libhdfs must call DetachCurrentThread when a thread is destroyed. 
Contributed by Colin Patrick McCabe (Revision 1361008)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1361008
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/config.h.cmake
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfsJniHelper.c


 libhdfs must call DetachCurrentThread when a thread is destroyed
 

 Key: HDFS-799
 URL: https://issues.apache.org/jira/browse/HDFS-799
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Christian Kunz
Assignee: Colin Patrick McCabe
 Fix For: 2.0.1-alpha

 Attachments: HDFS-799.001.patch, HDFS-799.003.patch, 
 HDFS-799.004.patch, HDFS-799.005.patch


 Threads that call AttachCurrentThread in libhdfs and disappear without 
 calling DetachCurrentThread cause a memory leak.
 Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3654) TestJspHelper#testGetUgi may fail

Eli Collins created HDFS-3654:
-

 Summary: TestJspHelper#testGetUgi may fail
 Key: HDFS-3654
 URL: https://issues.apache.org/jira/browse/HDFS-3654
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.1-alpha
Reporter: Eli Collins
Assignee: Eli Collins


Looks like my recent change in HDFS-3639 can occasionally cause this test to 
fail. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3655) datenode recoverRbw could hang sometime

2012-07-12 Thread Ming Ma (JIRA)

Ming Ma created HDFS-3655:
-

 Summary: datenode recoverRbw could hang sometime
 Key: HDFS-3655
 URL: https://issues.apache.org/jira/browse/HDFS-3655
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Ming Ma
 Fix For: 0.22.1


This bug seems to apply to 0.22 and hadoop 2.0. I will upload the initial fix 
done by my colleague Xiaobo Peng shortly ( there is some logistics issue being 
worked on so that he can upload patch himself later ).


recoverRbw try to kill the old writer thread, but it took the lock (FSDataset 
monitor object) which the old writer thread is waiting on ( for example the 
call to data.getTmpInputStreams ).


DataXceiver for client /10.110.3.43:40193 [Receiving block 
blk_-3037542385914640638_57111747 
client=DFSClient_attempt_201206021424_0001_m_000401_0] daemon prio=10 
tid=0x7facf8111800 nid=0x6b64 in Object.wait() [0x7facd1ddb000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1186)

■locked 0x0007856c1200 (a org.apache.hadoop.util.Daemon)
at java.lang.Thread.join(Thread.java:1239)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:158)
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.recoverRbw(FSDataset.java:1347)
■locked 0x0007838398c0 (a 
org.apache.hadoop.hdfs.server.datanode.FSDataset)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:119)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlockInternal(DataXceiver.java:391)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:327)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:405)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:344)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)
at java.lang.Thread.run(Thread.java:662)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3610) fuse_dfs: Provide a way to use the default (configured) NN URI


[ 
https://issues.apache.org/jira/browse/HDFS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413372#comment-13413372
 ] 

Eli Collins commented on HDFS-3610:
---

Looks good, since this patch contains HDFS-3609 let's get that one checked in 
then upload a patch here that's just the delta.

TestBackupNode failure is unrelated. TestJspHelper is as well, filed HDFS-3654.

 fuse_dfs: Provide a way to use the default (configured) NN URI
 --

 Key: HDFS-3610
 URL: https://issues.apache.org/jira/browse/HDFS-3610
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3610.001.patch


 It shouldn't be necessary to explictly spell out the NameNode you want to 
 connect to when launching fuse_dfs.  libhdfs can read the configuration files 
 and use the default URI.  However, we don't have a command-line option for 
 this in fuse_dfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3371) EditLogFileInputStream: be more careful about closing streams when we're done with them.


 [ 
https://issues.apache.org/jira/browse/HDFS-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3371:
--

Status: Open  (was: Patch Available)

 EditLogFileInputStream: be more careful about closing streams when we're done 
 with them.
 

 Key: HDFS-3371
 URL: https://issues.apache.org/jira/browse/HDFS-3371
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3371.001.patch, HDFS-3371.002.patch


 EditLogFileInputStream#EditLogFileInputStream should be more careful about 
 closing streams when there is an exception thrown.  Also, 
 EditLogFileInputStream#close should close all of the streams we opened in the 
 constructor, not just one of them (although the file-backed one is probably 
 the most important).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3270) run valgrind on fuse-dfs, fix any memory leaks


[ 
https://issues.apache.org/jira/browse/HDFS-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413377#comment-13413377
 ] 

Eli Collins commented on HDFS-3270:
---

Colin,
This patch is covered by HDFS-3609, anything left to do here?

 run valgrind on fuse-dfs, fix any memory leaks
 --

 Key: HDFS-3270
 URL: https://issues.apache.org/jira/browse/HDFS-3270
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3270.001.patch, HDFS-3270.002.patch


 run valgrind on fuse-dfs, fix any memory leaks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3612) Single namenode image directory config warning can be improved

2012-07-12 Thread Andy Isaacson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andy Isaacson updated HDFS-3612:

Attachment: hdfs3612-2.txt

bq. double space after period
I always type double space after period, but I agree it's irrelevant. Fixed.

bq. role string
Agreed, nice improvement.

Attaching new patch.

Single namenode image directory config warning can be improved
--

Key: HDFS-3612
URL: https://issues.apache.org/jira/browse/HDFS-3612
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Andy Isaacson
Priority: Trivial
Labels: newbie
Attachments: hdfs3612-2.txt, hdfs3612.txt

Currently, if you configure the NameNode to run with just one
dfs.namenode.name.dir directory, it prints:
{code}
12/07/08 20:00:22 WARN namenode.FSNamesystem: Only one dfs.namenode.name.dir
directory configured , beware data loss!{code}
We can improve this in a few ways as it is slightly ambiguous:
# Fix punctuation spacing, there's always a space after a punctuation mark
but never before one.
# Perhaps the message is better printed with a reason of why it may cause a
scare of data loss. For instance, we can print Detected a single storage
directory in dfs.namenode.name.dir configuration. Beware of dataloss due to
lack of redundant storage directories or so.

[jira] [Commented] (HDFS-3609) libhdfs: don't force the URI to look like hdfs://hostname:port


[ 
https://issues.apache.org/jira/browse/HDFS-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413381#comment-13413381
 ] 

Colin Patrick McCabe commented on HDFS-3609:


I've verified I can connect without an explicit port, using hdfs_test.  There 
will be an opportunity to add more unit tests once HDFS-3606 is in.

 libhdfs: don't force the URI to look like hdfs://hostname:port
 --

 Key: HDFS-3609
 URL: https://issues.apache.org/jira/browse/HDFS-3609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3609.001.patch


 Currently, libhdfs forces the URI to look like hdfs://hostname:port.
 For configurations like HA or federation this is not ideal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3609) libhdfs: don't force the URI to look like hdfs://hostname:port


 [ 
https://issues.apache.org/jira/browse/HDFS-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3609:
---

Attachment: HDFS-3609.002.patch

* refer to 'hdfs://' as a 'scheme' rather than a 'prefix'

 libhdfs: don't force the URI to look like hdfs://hostname:port
 --

 Key: HDFS-3609
 URL: https://issues.apache.org/jira/browse/HDFS-3609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.0-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3609.001.patch, HDFS-3609.002.patch


 Currently, libhdfs forces the URI to look like hdfs://hostname:port.
 For configurations like HA or federation this is not ideal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3306) fuse_dfs: don't lock release operations


 [ 
https://issues.apache.org/jira/browse/HDFS-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3306:
--

   Resolution: Fixed
Fix Version/s: 2.0.1-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Per offline conversation Colin tested this multithreaded.

+1   I've committed this and merged to branch-2.



 fuse_dfs: don't lock release operations
 ---

 Key: HDFS-3306
 URL: https://issues.apache.org/jira/browse/HDFS-3306
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3306.001.patch


 There's no need to lock release operations in FUSE, because release can only 
 be called once on a fuse_file_info structure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3633) libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE


[ 
https://issues.apache.org/jira/browse/HDFS-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413393#comment-13413393
 ] 

Hudson commented on HDFS-3633:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2478 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2478/])
HDFS-3633. libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE. 
Contributed by Colin Patrick McCabe (Revision 1361005)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1361005
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c


 libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE
 -

 Key: HDFS-3633
 URL: https://issues.apache.org/jira/browse/HDFS-3633
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3633.001.patch


 In libhdfs in hdfsDelete, the header file says any non-zero argument to 
 hdfsDelete will be interpreted as true.  However, the hdfsDelete function 
 does not translate these non-zero values to JNI_FALSE and JNI_TRUE, 
 potentially leading to undefined or JVM-specific behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3306) fuse_dfs: don't lock release operations


[ 
https://issues.apache.org/jira/browse/HDFS-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413397#comment-13413397
 ] 

Hudson commented on HDFS-3306:
--

Integrated in Hadoop-Common-trunk-Commit #2460 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2460/])
HDFS-3306. fuse_dfs: don't lock release operations. Contributed by Colin 
Patrick McCabe (Revision 1361021)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1361021
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_release.c


 fuse_dfs: don't lock release operations
 ---

 Key: HDFS-3306
 URL: https://issues.apache.org/jira/browse/HDFS-3306
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3306.001.patch


 There's no need to lock release operations in FUSE, because release can only 
 be called once on a fuse_file_info structure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3306) fuse_dfs: don't lock release operations