date:20141022


[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179671#comment-14179671
 ] 

Hadoop QA commented on HDFS-7226:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676258/HDFS-7226.003.patch
  against trunk revision c0e0343.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8478//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8478//console

This message is automatically generated.

 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, 
 HDFS-7226.003.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7246) Use ids for DatanodeStorageInfo in the BlockInfo triplets - HDFS 6660

2014-10-22 Thread Amir Langer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Langer updated HDFS-7246:
--
Summary: Use ids for DatanodeStorageInfo in the BlockInfo triplets - HDFS 
6660  (was: Use ids for DatanodeStorageInfo in the BlockInfo triplets)

 Use ids for DatanodeStorageInfo in the BlockInfo triplets - HDFS 6660
 -

 Key: HDFS-7246
 URL: https://issues.apache.org/jira/browse/HDFS-7246
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Amir Langer

 Identical to HDFS-6660



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7014) Implement input and output streams to DataNode for native client


 [ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanwei Wang updated HDFS-7014:
---
Attachment: HDFS-7014-pnative.004.patch

 Implement input and output streams to DataNode for native client
 

 Key: HDFS-7014
 URL: https://issues.apache.org/jira/browse/HDFS-7014
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: 0001-HDFS-7014-001.patch, HDFS-7014-pnative.002.patch, 
 HDFS-7014-pnative.003.patch, HDFS-7014-pnative.004.patch, HDFS-7014.patch


 Implement Client - Namenode RPC protocol and support Namenode HA.
 Implement Client - Datanode RPC protocol.
 Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7014) Implement input streams and file system functionality


 [ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanwei Wang updated HDFS-7014:
---
Summary: Implement input streams and file system functionality  (was: 
Implement input and output streams to DataNode for native client)

 Implement input streams and file system functionality
 -

 Key: HDFS-7014
 URL: https://issues.apache.org/jira/browse/HDFS-7014
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: 0001-HDFS-7014-001.patch, HDFS-7014-pnative.002.patch, 
 HDFS-7014-pnative.003.patch, HDFS-7014-pnative.004.patch, HDFS-7014.patch


 Implement Client - Namenode RPC protocol and support Namenode HA.
 Implement Client - Datanode RPC protocol.
 Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7014) Implement input streams and file system functionality


[ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179755#comment-14179755
 ] 

Zhanwei Wang commented on HDFS-7014:


HDFS-7014-pnative.003.patch was created incorrectly. I create a new patch 
HDFS-7014-pnative.004.patch that implements the features which I mentioned 
above and  separate the code that are related to OutputStream. 

If this patch is OK, I think it is time to commit it and work on HDFS-7017.

 Implement input streams and file system functionality
 -

 Key: HDFS-7014
 URL: https://issues.apache.org/jira/browse/HDFS-7014
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: 0001-HDFS-7014-001.patch, HDFS-7014-pnative.002.patch, 
 HDFS-7014-pnative.003.patch, HDFS-7014-pnative.004.patch, HDFS-7014.patch


 Implement Client - Namenode RPC protocol and support Namenode HA.
 Implement Client - Datanode RPC protocol.
 Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HDFS-7017) Implement OutputStream for libhdfs3


 [ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7017 started by Zhanwei Wang.
--
 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.


[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179782#comment-14179782
 ] 

Hadoop QA commented on HDFS-6877:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676278/HDFS-6877.007.patch
  against trunk revision 7e3b5e6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8479//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8479//console

This message is automatically generated.

 Avoid calling checkDisk when an HDFS volume is removed during a write.
 --

 Key: HDFS-6877
 URL: https://issues.apache.org/jira/browse/HDFS-6877
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6877.000.consolidate.txt, 
 HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
 HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
 HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
 HDFS-6877.007.patch


 Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
 volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response


[ 
https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179833#comment-14179833
 ] 

Hudson commented on HDFS-7259:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/720/])
HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. 
Contributed by Brandon Li (brandonli: rev 
b6f9d5538cf2b425652687e99503f3d566b2056a)
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java


 Unresponseive NFS mount point due to deferred COMMIT response
 -

 Key: HDFS-7259
 URL: https://issues.apache.org/jira/browse/HDFS-7259
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.6.0

 Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch


 Since the gateway can't commit random write, it caches the COMMIT requests in 
 a queue and send back response only when the data can be committed or stream 
 timeout (failure in the latter case). This could cause problems two patterns:
 (1) file uploading failure 
 (2) the mount dir is stuck on the same client, but other NFS clients can 
 still access NFS gateway.
 The error pattern (2) is because there are too many COMMIT requests pending, 
 so the NFS client can't send any other requests(e.g., for ls) to NFS 
 gateway with its pending requests limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179827#comment-14179827
 ] 

Hudson commented on HDFS-6581:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/720/])
Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev 
b85919feef64ed8b05b84ab8c372844a815cc139)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Write to single replica in memory
 -

 Key: HDFS-6581
 URL: https://issues.apache.org/jira/browse/HDFS-6581
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.6.0

 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
 HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
 HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
 HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, 
 HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, 
 HDFSWriteableReplicasInMemory.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf


 Per discussion with the community on HDFS-5851, we will implement writing to 
 a single replica in DN memory via DataTransferProtocol.
 This avoids some of the issues with short-circuit writes, which we can 
 revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently


[ 
https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179834#comment-14179834
 ] 

Hudson commented on HDFS-7221:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/720/])
HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by 
Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestDNFencingWithReplication fails consistently
 ---

 Key: HDFS-7221
 URL: https://issues.apache.org/jira/browse/HDFS-7221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, 
 HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch


 TestDNFencingWithReplication consistently fails with a timeout, both in 
 jenkins runs and on my local machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon


[ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179828#comment-14179828
 ] 

Hudson commented on HDFS-7204:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/720/])
HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 
4baca311ffb5489fbbe08288502db68875834920)
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs


 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway


[ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179826#comment-14179826
 ] 

Hudson commented on HDFS-7215:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/720/])
HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li 
(brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea)
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Add JvmPauseMonitor to NFS gateway
 --

 Key: HDFS-7215
 URL: https://issues.apache.org/jira/browse/HDFS-7215
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7215.001.patch


 Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently


[ 
https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179923#comment-14179923
 ] 

Hudson commented on HDFS-7221:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by 
Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java


 TestDNFencingWithReplication fails consistently
 ---

 Key: HDFS-7221
 URL: https://issues.apache.org/jira/browse/HDFS-7221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, 
 HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch


 TestDNFencingWithReplication consistently fails with a timeout, both in 
 jenkins runs and on my local machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179915#comment-14179915
 ] 

Hudson commented on HDFS-6581:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev 
b85919feef64ed8b05b84ab8c372844a815cc139)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Write to single replica in memory
 -

 Key: HDFS-6581
 URL: https://issues.apache.org/jira/browse/HDFS-6581
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.6.0

 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
 HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
 HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
 HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, 
 HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, 
 HDFSWriteableReplicasInMemory.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf


 Per discussion with the community on HDFS-5851, we will implement writing to 
 a single replica in DN memory via DataTransferProtocol.
 This avoids some of the issues with short-circuit writes, which we can 
 revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response


[ 
https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179922#comment-14179922
 ] 

Hudson commented on HDFS-7259:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. 
Contributed by Brandon Li (brandonli: rev 
b6f9d5538cf2b425652687e99503f3d566b2056a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java


 Unresponseive NFS mount point due to deferred COMMIT response
 -

 Key: HDFS-7259
 URL: https://issues.apache.org/jira/browse/HDFS-7259
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.6.0

 Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch


 Since the gateway can't commit random write, it caches the COMMIT requests in 
 a queue and send back response only when the data can be committed or stream 
 timeout (failure in the latter case). This could cause problems two patterns:
 (1) file uploading failure 
 (2) the mount dir is stuck on the same client, but other NFS clients can 
 still access NFS gateway.
 The error pattern (2) is because there are too many COMMIT requests pending, 
 so the NFS client can't send any other requests(e.g., for ls) to NFS 
 gateway with its pending requests limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway


[ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179914#comment-14179914
 ] 

Hudson commented on HDFS-7215:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li 
(brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea)
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm


 Add JvmPauseMonitor to NFS gateway
 --

 Key: HDFS-7215
 URL: https://issues.apache.org/jira/browse/HDFS-7215
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7215.001.patch


 Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon


[ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179916#comment-14179916
 ] 

Hudson commented on HDFS-7204:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/])
HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 
4baca311ffb5489fbbe08288502db68875834920)
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh


 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon


[ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180008#comment-14180008
 ] 

Hudson commented on HDFS-7204:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 
4baca311ffb5489fbbe08288502db68875834920)
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh


 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently


[ 
https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180014#comment-14180014
 ] 

Hudson commented on HDFS-7221:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by 
Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java


 TestDNFencingWithReplication fails consistently
 ---

 Key: HDFS-7221
 URL: https://issues.apache.org/jira/browse/HDFS-7221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, 
 HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch


 TestDNFencingWithReplication consistently fails with a timeout, both in 
 jenkins runs and on my local machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180007#comment-14180007
 ] 

Hudson commented on HDFS-6581:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev 
b85919feef64ed8b05b84ab8c372844a815cc139)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Write to single replica in memory
 -

 Key: HDFS-6581
 URL: https://issues.apache.org/jira/browse/HDFS-6581
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.6.0

 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
 HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
 HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
 HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, 
 HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, 
 HDFSWriteableReplicasInMemory.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf, 
 Test-Plan-for-HDFS-6581-Memory-Storage.pdf


 Per discussion with the community on HDFS-5851, we will implement writing to 
 a single replica in DN memory via DataTransferProtocol.
 This avoids some of the issues with short-circuit writes, which we can 
 revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway


[ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180006#comment-14180006
 ] 

Hudson commented on HDFS-7215:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li 
(brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea)
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java


 Add JvmPauseMonitor to NFS gateway
 --

 Key: HDFS-7215
 URL: https://issues.apache.org/jira/browse/HDFS-7215
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7215.001.patch


 Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response


[ 
https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180013#comment-14180013
 ] 

Hudson commented on HDFS-7259:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/])
HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. 
Contributed by Brandon Li (brandonli: rev 
b6f9d5538cf2b425652687e99503f3d566b2056a)
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java


 Unresponseive NFS mount point due to deferred COMMIT response
 -

 Key: HDFS-7259
 URL: https://issues.apache.org/jira/browse/HDFS-7259
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.6.0

 Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch


 Since the gateway can't commit random write, it caches the COMMIT requests in 
 a queue and send back response only when the data can be committed or stream 
 timeout (failure in the latter case). This could cause problems two patterns:
 (1) file uploading failure 
 (2) the mount dir is stuck on the same client, but other NFS clients can 
 still access NFS gateway.
 The error pattern (2) is because there are too many COMMIT requests pending, 
 so the NFS client can't send any other requests(e.g., for ls) to NFS 
 gateway with its pending requests limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()

2014-10-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180209#comment-14180209
 ] 

Ted Yu commented on HDFS-6291:
--

With image.close() in finally block, the catch block doesn't need to call it, 
right ?

 FSImage may be left unclosed in BootstrapStandby#doRun()
 

 Key: HDFS-6291
 URL: https://issues.apache.org/jira/browse/HDFS-6291
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Reporter: Ted Yu
Priority: Minor
 Attachments: HDFS-6291.patch


 At around line 203:
 {code}
   if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) {
 return ERR_CODE_LOGS_UNAVAILABLE;
   }
 {code}
 If we return following the above check, image is not closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test

2014-10-22 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7226:

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the fix, Yongjun! I've committed this to trunk and branch-2.

 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0

 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, 
 HDFS-7226.003.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite


[ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180239#comment-14180239
 ] 

Hudson commented on HDFS-7228:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6311 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6311/])
HDFS-7228. Fix TestDNFencing.testQueueingWithAppend. Contributed by Yongjun 
Zhang. (jing9: rev 1c8d191117de3d2e035bd728bccfde0f4b81296f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Add an SSD policy into the default BlockStoragePolicySuite
 --

 Key: HDFS-7228
 URL: https://issues.apache.org/jira/browse/HDFS-7228
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch, 
 HDFS-7228.002.patch, HDFS-7228.003.patch, HDFS-7228.003.patch


 Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
 policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
 the SSD storage type, it will be useful to also include a SSD related storage 
 policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test


[ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180290#comment-14180290
 ] 

Yongjun Zhang commented on HDFS-7226:
-

Thanks a lot [~jingzhao]! Hopefully the next hdfs build will be clean.



 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0

 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, 
 HDFS-7226.003.patch


 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck


 [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7180:
-
Attachment: HDFS-7180.002.patch

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
 cache now
 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
 doing static UID/GID mapping because '/etc/nfs.map' does not exist.
 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
 ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
 [10.0.3.172:50010, 10.0.3.176:50010]
 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
 java.io.IOException: Bad response ERROR for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 from datanode 
 10.0.3.176:50010
 at

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck


[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180305#comment-14180305
 ] 

Brandon Li commented on HDFS-7180:
--

Nice catch, Jing.
I've uploaded a new patch. It lets dumper notify waiting threads even when 
error happens. I also did some code cleanup.


 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
 cache now
 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
 doing static UID/GID mapping because '/etc/nfs.map' does not exist.
 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
 ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
 [10.0.3.172:50010, 10.0.3.176:50010]
 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck


[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180307#comment-14180307
 ] 

Brandon Li commented on HDFS-7180:
--

The unit test seems tricky to add. I did some file uploading tests to see the 
pending non-sequencial writes were under control. 

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
 cache now
 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
 doing static UID/GID mapping because '/etc/nfs.map' does not exist.
 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
 ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
 [10.0.3.172:50010, 10.0.3.176:50010]
 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
 java.io.IOException: Bad

[jira] [Commented] (HDFS-7231) rollingupgrade needs some guard rails


[ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180312#comment-14180312
 ] 

Suresh Srinivas commented on HDFS-7231:
---

Allen, I just rewrote the steps with additional details to clarify:
# Upgrade 2.0.5 cluster to 2.2
# Do not -finalizeUpgrade
# Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1.
# Start namenode -upgrade option.
# Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress
# Leave 2.4.1 DNs running
# Install binaries on NN to 2.2
# Start NN on 2.2 with no upgrade related options

So far things are clear. Then you go on to say, the following:
bq. DNs now do a partial roll-forward, rendering them unable to continue
What do you mean by this?

bq. admins manually repair version files on those broken directories
This is as you know is a recipe for disaster.

Let me ask you a question. Before you go on to 2.4.1, if you do finalize of 
upgrade what happens?

 rollingupgrade needs some guard rails
 -

 Key: HDFS-7231
 URL: https://issues.apache.org/jira/browse/HDFS-7231
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Allen Wittenauer
Priority: Blocker

 See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177518#comment-14177518
 ] 

Colin Patrick McCabe edited comment on HDFS-7235 at 10/22/14 7:15 PM:
--

{code}
1787  ReplicaInfo replicaInfo = null;
1788  synchronized(data) {
1789replicaInfo = (ReplicaInfo) data.getReplica( 
block.getBlockPoolId(),
1790block.getBlockId());
1791  }
1792  if (replicaInfo != null 
1793   replicaInfo.getState() == ReplicaState.FINALIZED 
1794   !replicaInfo.getBlockFile().exists()) {
{code}
You can't release the lock this way.  Once you release the lock, replicaInfo 
could be mutated at any time.  So you need to do all the check under the lock.

{code}
1795//
1796// Report back to NN bad block caused by non-existent block 
file.
1797// WATCH-OUT: be sure the conditions checked above matches the 
following
1798// method in FsDatasetImpl.java:
1799//   boolean isValidBlock(ExtendedBlock b)
1800// all other conditions need to be true except that 
1801// replicaInfo.getBlockFile().exists() returns false.
1802//
{code}
I don't think we need the WATCH-OUT part.  We shouldn't be calling 
{{isValidBlock}}, so why do we care if the check is the same as that check?

I generally agree with this approach and I think we can get this in if that's 
fixed.


was (Author: cmccabe):
{code}
1787  ReplicaInfo replicaInfo = null;
1788  synchronized(data) {
1789replicaInfo = (ReplicaInfo) data.getReplica( 
block.getBlockPoolId(),
1790block.getBlockId());
1791  }
1792  if (replicaInfo != null 
1793   replicaInfo.getState() == ReplicaState.FINALIZED 
1794   !replicaInfo.getBlockFile().exists()) {
{code}
You can't release the lock this way.  Once you release the lock, replicaInfo 
could be mutated at any time.  So you need to do all the check under the lock.

{code}
1795//
1796// Report back to NN bad block caused by non-existent block 
file.
1797// WATCH-OUT: be sure the conditions checked above matches the 
following
1798// method in FsDatasetImpl.java:
1799//   boolean isValidBlock(ExtendedBlock b)
1800// all other conditions need to be true except that 
1801// replicaInfo.getBlockFile().exists() returns false.
1802//
{code}
I don't think we need the WATCH-OUT part.  We're not calling 
{{isValidBlock}}, so why do we care if the check is the same as that check?

I generally agree with this approach and I think we can get this in if that's 
fixed.

 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck


[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180382#comment-14180382
 ] 

Hadoop QA commented on HDFS-7180:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676390/HDFS-7180.002.patch
  against trunk revision d67214f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8480//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8480//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8480//console

This message is automatically generated.

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180466#comment-14180466
 ] 

Colin Patrick McCabe commented on HDFS-7235:


Hi Yongjun,

Thanks for your patience here.  I don't think the current patch is quite ready. 
 I could point to a few things, like this:  {{ReplicaInfo replicaInfo = 
(ReplicaInfo) data.getReplica(}}  We shouldn't be downcasting here.

I think the bigger issue is that the interface in FsDatasetSpi is just not very 
suitable to what we're trying to do.  Rather than trying to hack it, I think we 
should come up with a better interface.

I think we should replace {{FsDatasetSpi#isValid}} with this function:

{code}
  /**
   * Check if a block is valid.
   *
   * @param b   The block to check.
   * @param minLength   The minimum length that the block must have.  May be 0.
   * @param state   If this is null, it is ignored.  If it is non-null, we
   *will check that the replica has this state.
   *
   * @throws FileNotFoundException If the replica is not found or 
there 
   *  was an error locating it.
   * @throws EOFException  If the replica length is too 
short.
   * @throws UnexpectedReplicaStateException   If the replica is not in the 
   * expected state.
   */
  public void checkBlock(ExtendedBlock b, long minLength, ReplicaState state);
{code}

Since this function will throw a clearly marked exception detailing which case 
we're in, we won't have to call multiple functions.  This will be better for 
performance since we're only taking the lock once.  This will also be better 
for clarity, since the current APIs lead to some rather complex code.

We could also get rid of {{FsDatasetSpi#isValidRbw}}, since this function can 
do everything that it can.
Also UnexpectedReplicaStateException could be a new exception under 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/UnexpectedReplicaStateException.java

I think it's fine to change FsDatasetSpi for this (we did it when adding 
caching stuff, and again when adding trash).

Let me know what you think.  I think it would make things a lot more clear.

 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180474#comment-14180474
 ] 

Siqi Li commented on HDFS-5928:
---

[~wheat9] I have added the check for both namespace and namenodeID

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7254) Add documents for hot swap drive


[ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180476#comment-14180476
 ] 

Colin Patrick McCabe commented on HDFS-7254:


+1.  Thanks, Eddy.

Test failure is not related because this is only a docs change.

 Add documents for hot swap drive
 

 Key: HDFS-7254
 URL: https://issues.apache.org/jira/browse/HDFS-7254
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
 HDFS-7254.002.patch, HDFS-7254.003.patch


 Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7254) Add documentation for hot swaping DataNode drives


 [ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7254:
---
Summary: Add documentation for hot swaping DataNode drives  (was: Add 
documents for hot swap drive)

 Add documentation for hot swaping DataNode drives
 -

 Key: HDFS-7254
 URL: https://issues.apache.org/jira/browse/HDFS-7254
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
 HDFS-7254.002.patch, HDFS-7254.003.patch


 Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7254) Add documentation for hot swaping DataNode drives


 [ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7254:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 2.6.0)
  Status: Resolved  (was: Patch Available)

 Add documentation for hot swaping DataNode drives
 -

 Key: HDFS-7254
 URL: https://issues.apache.org/jira/browse/HDFS-7254
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
 HDFS-7254.002.patch, HDFS-7254.003.patch


 Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7254) Add documentation for hot swaping DataNode drives


[ 
https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180486#comment-14180486
 ] 

Hudson commented on HDFS-7254:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6314 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6314/])
HDFS-7254. Add documentation for hot swaping DataNode drives (Lei Xu via Colin 
P. McCabe) (cmccabe: rev 66e8187ea1dbc6230ab2c633e4f609a7068b75db)
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Add documentation for hot swaping DataNode drives
 -

 Key: HDFS-7254
 URL: https://issues.apache.org/jira/browse/HDFS-7254
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, 
 HDFS-7254.002.patch, HDFS-7254.003.patch


 Add documents for the hot swap drive functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180490#comment-14180490
 ] 

Haohui Mai commented on HDFS-5928:
--

The code can be simplified in putting the relevant information in an object. 
For example:

{code}
{#HAInfo}
{namespace}-{nnid}
{/HAInfo}
{code}

In the javascript side:

{code}
var namespace = null, nnid = null;
// parse XML and set namespace and nnid
if (namespace  nnid) {
  HAInfo = {namespace: namespace, nnid: nnid}
}
{code}


 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.


[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180497#comment-14180497
 ] 

Colin Patrick McCabe commented on HDFS-6877:


+1.  Thanks, Eddy.

TestDNFencing failure is HDFS-7226, not related.

 Avoid calling checkDisk when an HDFS volume is removed during a write.
 --

 Key: HDFS-6877
 URL: https://issues.apache.org/jira/browse/HDFS-6877
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6877.000.consolidate.txt, 
 HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
 HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
 HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
 HDFS-6877.007.patch


 Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
 volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.


 [ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6877:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 3.0.0)
  Status: Resolved  (was: Patch Available)

 Avoid calling checkDisk when an HDFS volume is removed during a write.
 --

 Key: HDFS-6877
 URL: https://issues.apache.org/jira/browse/HDFS-6877
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-6877.000.consolidate.txt, 
 HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
 HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
 HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
 HDFS-6877.007.patch


 Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
 volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7257) Add the time of last HA state transition to NN's /jmx page

2014-10-22 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180499#comment-14180499
 ] 

Andrew Wang commented on HDFS-7257:
---

I don't think there are any timezone concerns, considering that the timezone is 
shown as part of the string. However, if you'd prefer that it's not included, 
I'm okay with that. I agree that it can just be converted for usage on the 
webUI.

A final note, it'd also be better to use a standardized date format like ISO 
8601 rather than creating a new one: http://en.wikipedia.org/wiki/ISO_8601

 Add the time of last HA state transition to NN's /jmx page
 --

 Key: HDFS-7257
 URL: https://issues.apache.org/jira/browse/HDFS-7257
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7257.001.patch, HDFS-7257.002.patch, 
 HDFS-7257.003.patch


 It would be useful to some monitoring apps to expose the last HA transition 
 time in the NN's /jmx page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.


[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180505#comment-14180505
 ] 

Hudson commented on HDFS-6877:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6315 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6315/])
HDFS-6877. Avoid calling checkDisk when an HDFS volume is removed during a 
write. (Lei Xu via Colin P. McCabe) (cmccabe: rev 
7b0f9bb2583cd9b7274f1e31c173c1c6a7ce467b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java


 Avoid calling checkDisk when an HDFS volume is removed during a write.
 --

 Key: HDFS-6877
 URL: https://issues.apache.org/jira/browse/HDFS-6877
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-6877.000.consolidate.txt, 
 HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
 HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
 HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
 HDFS-6877.007.patch


 Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
 volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180524#comment-14180524
 ] 

Siqi Li commented on HDFS-5928:
---

I don't think this is going to work, if the cluster doesn't have HA or 
federation. Also, it's good to let people know what is namespace and what 
namenodeID

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id


 [ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6663:
--
Attachment: HDFS-6663-5.patch

Decommission status of a block contains more details. It will show a block is 
decomissioning or decomissioned.

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, 
 HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-10-22 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180536#comment-14180536
 ] 

Lei (Eddy) Xu commented on HDFS-6877:
-

Thank you for checking in this! [~cmccabe]

 Avoid calling checkDisk when an HDFS volume is removed during a write.
 --

 Key: HDFS-6877
 URL: https://issues.apache.org/jira/browse/HDFS-6877
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-6877.000.consolidate.txt, 
 HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
 HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
 HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, 
 HDFS-6877.007.patch


 Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
 volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180541#comment-14180541
 ] 

Haohui Mai commented on HDFS-5928:
--

The key idea is to ensure {{HAInfo}} is null in non-HA clusters. You might need 
some slight tweaks to make it work in all cases, but I think you get the idea.

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6694) TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms


[ 
https://issues.apache.org/jira/browse/HDFS-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180563#comment-14180563
 ] 

Chen He commented on HDFS-6694:
---

the one I got is: 

java.lang.RuntimeException: Deferred
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:130)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestContext.waitFor(MultithreadedTestUtil.java:121)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testPipelineRecoveryStress(TestPipelinesFailover.java:485)
Caused by: java.lang.AssertionError: expected:100 but was:0
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:123)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover$PipelineTestThread.doAnAction(TestPipelinesFailover.java:522)
at 
org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
at 
org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)


Results :

Tests in error: 
  TestPipelinesFailover.testPipelineRecoveryStress:485 » Runtime Deferred

 TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
 with various symptoms
 

 Key: HDFS-6694
 URL: https://issues.apache.org/jira/browse/HDFS-6694
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, 
 HDFS-6694.001.dbg.patch, HDFS-6694.002.dbg.patch, 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover-output.txt, 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.txt


 TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
 with various symptoms. Typical failures are described in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


 [ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-5928:
--
Attachment: HDFS-5928.v5.patch

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks

2014-10-22 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-2486:
--
Fix Version/s: (was: 3.0.0)
   2.7.0

I merged this down to branch-2 to make a cherry-pick cleaner.

 Review issues with UnderReplicatedBlocks
 

 Key: HDFS-2486
 URL: https://issues.apache.org/jira/browse/HDFS-2486
 Project: Hadoop HDFS
  Issue Type: Task
  Components: namenode
Affects Versions: 0.23.0
Reporter: Steve Loughran
Assignee: Uma Maheswara Rao G
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-2486.patch


 Here are some things I've noted in the UnderReplicatedBlocks class that 
 someone else should review and consider if the code is correct. If not, they 
 are easy to fix.
 remove(Block block, int priLevel) is not synchronized, and as the inner 
 classes are not, there is a risk of race conditions there.
 some of the code assumes that getPriority can return the value LEVEL, and if 
 so does not attempt to queue the blocks. As this return value is not 
 currently possible, those checks can be removed. 
 The queue gives priority to blocks whose replication count is less than a 
 third of its expected count over those that are normally under replicated. 
 While this is good for ensuring that files scheduled for large replication 
 are replicated fast, it may not be the best strategy for maintaining data 
 integrity. For that it may be better to give whichever blocks have only two 
 replicas priority over blocks that may, for example, already have 3 out of 10 
 copies in the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6694) TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms


[ 
https://issues.apache.org/jira/browse/HDFS-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180574#comment-14180574
 ] 

Yongjun Zhang commented on HDFS-6694:
-

HI [~airbots],

Thanks for reporting the issue you ran into. Would you please look into your 
log to see if there are Too many open files kind of messages?


 TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
 with various symptoms
 

 Key: HDFS-6694
 URL: https://issues.apache.org/jira/browse/HDFS-6694
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, 
 HDFS-6694.001.dbg.patch, HDFS-6694.002.dbg.patch, 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover-output.txt, 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.txt


 TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently 
 with various symptoms. Typical failures are described in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6888) Remove audit logging of getFIleInfo()


 [ 
https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6888:
--
Attachment: HDFS-6888-6.patch

update patch against trunk.TestBalancer and TestFailureToReadEdits work fine on 
my machine. TestPipelinesFailover failure is because of HDFS-6694

 Remove audit logging of getFIleInfo()
 -

 Key: HDFS-6888
 URL: https://issues.apache.org/jira/browse/HDFS-6888
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Chen He
  Labels: log
 Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, 
 HDFS-6888-5.patch, HDFS-6888-6.patch, HDFS-6888.patch


 The audit logging of getFileInfo() was added in HDFS-3733.  Since this is a 
 one of the most called method, users have noticed that audit log is now 
 filled with this.  Since we now have HTTP request logging, this seems 
 unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6824) Additional user documentation for HDFS encryption.

2014-10-22 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6824:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Thanks Yi, I committed this to branch-2 and trunk.

 Additional user documentation for HDFS encryption.
 --

 Key: HDFS-6824
 URL: https://issues.apache.org/jira/browse/HDFS-6824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Fix For: 2.7.0

 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
 hdfs-6824.002.patch


 We'd like to better document additional things about HDFS encryption: setup 
 and configuration, using alternate access methods (namely WebHDFS and 
 HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks


[ 
https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180587#comment-14180587
 ] 

Hudson commented on HDFS-2486:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6317 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6317/])
Move HDFS-2486 down to 2.7.0 in CHANGES.txt (wang: rev 
08457e9e57e4fa3c83217fd0a092e926ba7eb135)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Review issues with UnderReplicatedBlocks
 

 Key: HDFS-2486
 URL: https://issues.apache.org/jira/browse/HDFS-2486
 Project: Hadoop HDFS
  Issue Type: Task
  Components: namenode
Affects Versions: 0.23.0
Reporter: Steve Loughran
Assignee: Uma Maheswara Rao G
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-2486.patch


 Here are some things I've noted in the UnderReplicatedBlocks class that 
 someone else should review and consider if the code is correct. If not, they 
 are easy to fix.
 remove(Block block, int priLevel) is not synchronized, and as the inner 
 classes are not, there is a risk of race conditions there.
 some of the code assumes that getPriority can return the value LEVEL, and if 
 so does not attempt to queue the blocks. As this return value is not 
 currently possible, those checks can be removed. 
 The queue gives priority to blocks whose replication count is less than a 
 third of its expected count over those that are normally under replicated. 
 While this is good for ensuring that files scheduled for large replication 
 are replicated fast, it may not be the best strategy for maintaining data 
 integrity. For that it may be better to give whichever blocks have only two 
 replicas priority over blocks that may, for example, already have 3 out of 10 
 copies in the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6824) Additional user documentation for HDFS encryption.


[ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180588#comment-14180588
 ] 

Hudson commented on HDFS-6824:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6317 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6317/])
HDFS-6824. Additional user documentation for HDFS encryption. (wang: rev 
a36399e09c8c92911df08f78a4b88528b6dd513f)
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/TransparentEncryption.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Additional user documentation for HDFS encryption.
 --

 Key: HDFS-6824
 URL: https://issues.apache.org/jira/browse/HDFS-6824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Fix For: 2.7.0

 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
 hdfs-6824.002.patch


 We'd like to better document additional things about HDFS encryption: setup 
 and configuration, using alternate access methods (namely WebHDFS and 
 HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

Haohui Mai created HDFS-7277:


 Summary: Remove explicit dependency on netty 3.2 in BKJournal
 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor


The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal


 [ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7277:
-
Attachment: HDFS-7277.000.patch

 Remove explicit dependency on netty 3.2 in BKJournal
 

 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-7277.000.patch


 The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
 code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal


 [ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7277:
-
Status: Patch Available  (was: Open)

 Remove explicit dependency on netty 3.2 in BKJournal
 

 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-7277.000.patch


 The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
 code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180626#comment-14180626
 ] 

Yongjun Zhang commented on HDFS-7235:
-

Hi [~cmccabe], Thanks a lot for the side discussion and comment. I will look 
into.


 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal


[ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180667#comment-14180667
 ] 

Hadoop QA commented on HDFS-7277:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676439/HDFS-7277.000.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8485//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8485//console

This message is automatically generated.

 Remove explicit dependency on netty 3.2 in BKJournal
 

 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-7277.000.patch


 The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
 code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7232) Populate hostname in httpfs audit log


[ 
https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180670#comment-14180670
 ] 

Hadoop QA commented on HDFS-7232:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675587/HDFS-7232.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-httpfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8484//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8484//console

This message is automatically generated.

 Populate hostname in httpfs audit log
 -

 Key: HDFS-7232
 URL: https://issues.apache.org/jira/browse/HDFS-7232
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Zoran Dimitrijevic
Assignee: Zoran Dimitrijevic
Priority: Trivial
 Attachments: HDFS-7232.patch


 Currently httpfs audit logs do not log the request's IP address. Since they 
 use 
 hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties 
 which already contains hostname, it would be nice to add code to populate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-10-22 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180682#comment-14180682
]

Konstantin Shvachko commented on HDFS-6658:
---

I agree usually people remove data in order to have space to put more. And the
freed space usually fills up again in a couple of weeks or months.
I don't know if this asnwer is good enough. It is for me, but in the end you
got a bigger cluster.
It would be nice to find a way to detect fully empty arrays of the BlockList
and release them once the last reference is removed. That should be good enough
to avoid a stand-alone thread for garbage collecting or compacting in your
terms.

Namenode memory optimization - Block replicas list
---

Key: HDFS-6658
URL: https://issues.apache.org/jira/browse/HDFS-6658
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch,
Namenode Memory Optimizations - Block replicas list.docx

Part of the memory consumed by every BlockInfo object in the Namenode is a
linked list of block references for every DatanodeStorageInfo (called
triplets).
We propose to change the way we store the list in memory.
Using primitive integer indexes instead of object references will reduce the
memory needed for every block replica (when compressed oops is disabled) and
in our new design the list overhead will be per DatanodeStorageInfo and not
per block replica.
see attached design doc. for details and evaluation results.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal

2014-10-22 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180703#comment-14180703
 ] 

Jing Zhao commented on HDFS-7277:
-

+1

 Remove explicit dependency on netty 3.2 in BKJournal
 

 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-7277.000.patch


 The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
 code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold


 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6988:
-
Attachment: HDFS-6988.03.patch

 Add configurable limit for percentage-based eviction threshold
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Fix For: HDFS-6581

 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
 HDFS-6988.03.patch


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold


 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6988:
-
Fix Version/s: (was: HDFS-6581)
   3.0.0
Affects Version/s: (was: HDFS-6581)
   2.6.0
   Status: Patch Available  (was: In Progress)

 Add configurable limit for percentage-based eviction threshold
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Fix For: 3.0.0

 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
 HDFS-6988.03.patch


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold


[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180709#comment-14180709
 ] 

Xiaoyu Yao commented on HDFS-6988:
--

Thanks [~cmccabe] for the confirmation. I just submit a patch for it. 

 Add configurable limit for percentage-based eviction threshold
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Fix For: 3.0.0

 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
 HDFS-6988.03.patch


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7258) CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level


 [ 
https://issues.apache.org/jira/browse/HDFS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-7258:


Assignee: Xiaoyu Yao

 CacheReplicationMonitor rescan schedule log should use DEBUG level instead of 
 INFO level
 

 Key: HDFS-7258
 URL: https://issues.apache.org/jira/browse/HDFS-7258
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Minor

 CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 
 seconds to HDSF NN log as shown below. This should be a DEBUG level log to 
 avoid flooding the namenode log.  
 2014-10-17 07:52:30,265 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds
 2014-10-17 07:52:30,265 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:53:00,265 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 30001 milliseconds
 2014-10-17 07:53:00,266 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
 2014-10-17 07:53:30,267 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 30001 milliseconds
 2014-10-17 07:53:30,267 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:54:00,267 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 30001 milliseconds
 2014-10-17 07:54:00,268 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:54:30,268 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 30001 milliseconds
 2014-10-17 07:54:30,269 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:55:00,269 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds
 2014-10-17 07:55:00,269 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
 2014-10-17 07:55:30,268 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds
 2014-10-17 07:55:30,269 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:56:00,269 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 30001 milliseconds
 2014-10-17 07:56:00,270 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:56:30,270 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 30001 milliseconds
 2014-10-17 07:56:30,271 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
 2014-10-17 07:57:00,271 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds
 2014-10-17 07:57:00,272 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
 2014-10-17 07:57:30,271 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds
 2014-10-17 07:57:30,272 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
 2014-10-17 07:58:00,271 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds
 2014-10-17 07:58:00,271 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
 2014-10-17 07:58:30,271 INFO 
 org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
 Rescanning after 3 milliseconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id


[ 
https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180725#comment-14180725
 ] 

Hadoop QA commented on HDFS-6663:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676423/HDFS-6663-5.patch
  against trunk revision 7b0f9bb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8481//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8481//console

This message is automatically generated.

 Admin command to track file and locations from block id
 ---

 Key: HDFS-6663
 URL: https://issues.apache.org/jira/browse/HDFS-6663
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, 
 HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch


 A dfsadmin command that allows finding out the file and the locations given a 
 block number will be very useful in debugging production issues.   It may be 
 possible to add this feature to Fsck, instead of creating a new command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6742) Support sorting datanode list on the new NN webUI


[ 
https://issues.apache.org/jira/browse/HDFS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180743#comment-14180743
 ] 

Siqi Li commented on HDFS-6742:
---

[~airbots] Hi Chen, any updates on this jira? It would be extremely helpful 
when dealing with cluster with thousands of nodes

 Support sorting datanode list on the new NN webUI
 -

 Key: HDFS-6742
 URL: https://issues.apache.org/jira/browse/HDFS-6742
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Chen He

 The legacy webUI allows sorting datanode list based on specific column such 
 as hostname. It is handy for admins can find pattern more quickly, especially 
 for big clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck


 [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7180:
-
Attachment: HDFS-7180.003.patch

Uploaded a new patch to fix the findbugs warning.

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
 HDFS-7180.003.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
 cache now
 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
 doing static UID/GID mapping because '/etc/nfs.map' does not exist.
 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
 ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
 [10.0.3.172:50010, 10.0.3.176:50010]
 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
 java.io.IOException: Bad response ERROR for block

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

2014-10-22 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180759#comment-14180759
 ] 

Jing Zhao commented on HDFS-7180:
-

+1 pending Jenkins

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
 HDFS-7180.003.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
 cache now
 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
 doing static UID/GID mapping because '/etc/nfs.map' does not exist.
 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
 ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 
 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
 [10.0.3.172:50010, 10.0.3.176:50010]
 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
 java.io.IOException: Bad response ERROR for block 
 BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643

[jira] [Created] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

Colin Patrick McCabe created HDFS-7278:
--

 Summary: Add a command that allows sysadmins to manually trigger 
full block reports from a DN
 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


We should add a command that allows sysadmins to manually trigger full block 
reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN


 [ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7278:
---
Attachment: HDFS-7278.002.patch

 Add a command that allows sysadmins to manually trigger full block reports 
 from a DN
 

 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7278.002.patch


 We should add a command that allows sysadmins to manually trigger full block 
 reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN


 [ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7278:
---
Status: Patch Available  (was: Open)

 Add a command that allows sysadmins to manually trigger full block reports 
 from a DN
 

 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7278.002.patch


 We should add a command that allows sysadmins to manually trigger full block 
 reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN


[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180770#comment-14180770
 ] 

Suresh Srinivas commented on HDFS-7278:
---

[~cmccabe], can you describe why this is needed so that others have context?

 Add a command that allows sysadmins to manually trigger full block reports 
 from a DN
 

 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7278.002.patch


 We should add a command that allows sysadmins to manually trigger full block 
 reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

Haohui Mai created HDFS-7279:


 Summary: Use netty to implement DatanodeWebHdfsMethods
 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently the DN implements all related webhdfs functionality using jetty. As 
the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
and connection management, DN often suffers from long latency and OOM when its 
webhdfs component is under sustained heavy load.

This jira proposes to implement the webhdfs component in DN using netty, which 
can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN

2014-10-22 Thread Aaron T. Myers (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180775#comment-14180775
 ] 

Aaron T. Myers commented on HDFS-7278:
--

I think it's a good tool to have in our toolbox to work around possible bugs in 
NN replica accounting. If an operator suspects such an issue, they might be 
tempted to restart a DN, or all of the DNs in a cluster, in order to trigger 
full block reports. It'd be much lighter weight if instead the operator could 
just manually trigger a full BR instead of having to restart the DN and 
therefore need to scan all the DN data dirs, etc.

 Add a command that allows sysadmins to manually trigger full block reports 
 from a DN
 

 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7278.002.patch


 We should add a command that allows sysadmins to manually trigger full block 
 reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

[
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180784#comment-14180784
]

Haohui Mai commented on HDFS-7279:
--

An alternative option is to upgrade jetty and servlet. The New APIs from both
jetty and servlet such as asynchronous servlet can amend some of the issues.
Webhdfs on the DN side, however, is data intensive which does not fit the
servlet API very well. The servlet / jetty APIs do not give fine-grain control
on the resources that netty is able to provide. These controls are critical if
webhdfs needs to survive on heavy workload.

The strategy is proven by the mapreduce client, which already uses netty to
implement the shuffle functionality. For other URLs on the DNs, I plan to keep
jetty listening on a local address, but to have a reverse proxy in netty to
continue the serve these URLs.

Use netty to implement DatanodeWebHdfsMethods
-

Key: HDFS-7279
URL: https://issues.apache.org/jira/browse/HDFS-7279
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai

Currently the DN implements all related webhdfs functionality using jetty. As
the current jetty version the DN used (jetty 6) lacks of fine-grained buffer
and connection management, DN often suffers from long latency and OOM when
its webhdfs component is under sustained heavy load.
This jira proposes to implement the webhdfs component in DN using netty,
which can be more efficient and allow more finer-grain controls on webhdfs.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7223) Tracing span description of IPC client is too long

2014-10-22 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7223:
---
Attachment: HDFS-7223-1.patch

Thanks for the comment [~cmccabe]! I updated patch based on your suggestion.

 Tracing span description of IPC client is too long
 --

 Key: HDFS-7223
 URL: https://issues.apache.org/jira/browse/HDFS-7223
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: HDFS-7223-0.patch, HDFS-7223-1.patch


 Current span description for IPC call is too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180814#comment-14180814
 ] 

Hadoop QA commented on HDFS-5928:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676430/HDFS-5928.v5.patch
  against trunk revision 70719e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFSAcl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8482//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8482//console

This message is automatically generated.

 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal


 [ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7277:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the 
reviews.

 Remove explicit dependency on netty 3.2 in BKJournal
 

 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7277.000.patch


 The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
 code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6888) Remove audit logging of getFIleInfo()


[ 
https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180823#comment-14180823
 ] 

Hadoop QA commented on HDFS-6888:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676436/HDFS-6888-6.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8483//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8483//console

This message is automatically generated.

 Remove audit logging of getFIleInfo()
 -

 Key: HDFS-6888
 URL: https://issues.apache.org/jira/browse/HDFS-6888
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Chen He
  Labels: log
 Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, 
 HDFS-6888-5.patch, HDFS-6888-6.patch, HDFS-6888.patch


 The audit logging of getFileInfo() was added in HDFS-3733.  Since this is a 
 one of the most called method, users have noticed that audit log is now 
 filled with this.  Since we now have HTTP request logging, this seems 
 unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal


[ 
https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180824#comment-14180824
 ] 

Hudson commented on HDFS-7277:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6319 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6319/])
HDFS-7277. Remove explicit dependency on netty 3.2 in BKJournal. Contributed by 
Haohui Mai. (wheat9: rev f729ecf9d2b858e9ee97419e788f1a2ac38b15bb)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml


 Remove explicit dependency on netty 3.2 in BKJournal
 

 Key: HDFS-7277
 URL: https://issues.apache.org/jira/browse/HDFS-7277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7277.000.patch


 The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the 
 code does not use it. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-22 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7279:
--
Component/s: webhdfs
 datanode

 Use netty to implement DatanodeWebHdfsMethods
 -

 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, webhdfs
Reporter: Haohui Mai
Assignee: Haohui Mai

 Currently the DN implements all related webhdfs functionality using jetty. As 
 the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
 and connection management, DN often suffers from long latency and OOM when 
 its webhdfs component is under sustained heavy load.
 This jira proposes to implement the webhdfs component in DN using netty, 
 which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page


[ 
https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180836#comment-14180836
 ] 

Haohui Mai commented on HDFS-5928:
--

The patch looks good. Tested on a non-HA cluster and it looks good to me.

{code}
+{#HAInfo}
+h3{Namespace} {NamenodeID}/h3
+{/HAInfo}
{code}

Can you move the information into the table below? For example:

{code}
{#HAInfo}
  trthNamespace:/thtd{Namespace}/td/tr
  trthNamenode ID:/thtd{NamenodeID}/td/tr
{/HAInfo}
{code}

Can you post a screenshot on a HA cluster setup as well?



 show namespace and namenode ID on NN dfshealth page
 ---

 Key: HDFS-5928
 URL: https://issues.apache.org/jira/browse/HDFS-5928
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, 
 HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS－5928.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7280) Use netty 4 in WebImageViewer

Haohui Mai created HDFS-7280:


 Summary: Use netty 4 in WebImageViewer
 Key: HDFS-7280
 URL: https://issues.apache.org/jira/browse/HDFS-7280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck


[ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180853#comment-14180853
 ] 

Hadoop QA commented on HDFS-7180:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676474/HDFS-7180.003.patch
  against trunk revision 3b12fd6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1269 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8487//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8487//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8487//console

This message is automatically generated.

 NFSv3 gateway frequently gets stuck
 ---

 Key: HDFS-7180
 URL: https://issues.apache.org/jira/browse/HDFS-7180
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.5.0
 Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
 Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, 
 HDFS-7180.003.patch


 We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
 on one node in the cluster to let users upload data with rsync.
 However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
 seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
 we found is after around 1 day running and several hundreds GBs of data 
 uploaded.
 The NFSv3 daemon is started on one node and on the same node the NFS is 
 mounted.
 From the node where the NFS is mounted:
 dmsg shows like this:
 [1859245.368108] nfs: server localhost not responding, still trying
 [1859245.368111] nfs: server localhost not responding, still trying
 [1859245.368115] nfs: server localhost not responding, still trying
 [1859245.368119] nfs: server localhost not responding, still trying
 [1859245.368123] nfs: server localhost not responding, still trying
 [1859245.368127] nfs: server localhost not responding, still trying
 [1859245.368131] nfs: server localhost not responding, still trying
 [1859245.368135] nfs: server localhost not responding, still trying
 [1859245.368138] nfs: server localhost not responding, still trying
 [1859245.368142] nfs: server localhost not responding, still trying
 [1859245.368146] nfs: server localhost not responding, still trying
 [1859245.368150] nfs: server localhost not responding, still trying
 [1859245.368153] nfs: server localhost not responding, still trying
 The mounted directory can not be `ls` and `df -hT` gets stuck too.
 The latest lines from the nfs3 log in the hadoop logs directory:
 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 user map size: 35
 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
 group map size: 54
 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
 Have to change stable write to unstable write:FILE_SYNC
 2014-10-02 05:48:56,477 INFO

[jira] [Updated] (HDFS-7280) Use netty 4 in WebImageViewer


 [ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7280:
-
Status: Patch Available  (was: Open)

 Use netty 4 in WebImageViewer
 -

 Key: HDFS-7280
 URL: https://issues.apache.org/jira/browse/HDFS-7280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7280.000.patch


 This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7280) Use netty 4 in WebImageViewer


 [ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7280:
-
Attachment: HDFS-7280.000.patch

 Use netty 4 in WebImageViewer
 -

 Key: HDFS-7280
 URL: https://issues.apache.org/jira/browse/HDFS-7280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7280.000.patch


 This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7280) Use netty 4 in WebImageViewer


[ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180869#comment-14180869
 ] 

Hadoop QA commented on HDFS-7280:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676498/HDFS-7280.000.patch
  against trunk revision f729ecf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8490//console

This message is automatically generated.

 Use netty 4 in WebImageViewer
 -

 Key: HDFS-7280
 URL: https://issues.apache.org/jira/browse/HDFS-7280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7280.000.patch


 This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7281) Missing block is marked as corrupted block

2014-10-22 Thread Ming Ma (JIRA)

Ming Ma created HDFS-7281:
-

 Summary: Missing block is marked as corrupted block
 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma


In the situation where the block lost all its replicas, fsck shows the block is 
missing as well as corrupted. Perhaps it is better not to mark the block 
corrupted in this case. The reason it is marked as corrupted is numCorruptNodes 
== numNodes == 0 in the following code.

{noformat}
BlockManager
final boolean isCorrupt = numCorruptNodes == numNodes;
{noformat}

Would like to clarify if it is the intent to mark missing block as corrupted or 
it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block


[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180898#comment-14180898
 ] 

Yongjun Zhang commented on HDFS-7281:
-

Thanks reporting this issue [~mingma]. I happen to notice the same in a fsck 
report today. It's indeed confusing.



 Missing block is marked as corrupted block
 --

 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma

 In the situation where the block lost all its replicas, fsck shows the block 
 is missing as well as corrupted. Perhaps it is better not to mark the block 
 corrupted in this case. The reason it is marked as corrupted is 
 numCorruptNodes == numNodes == 0 in the following code.
 {noformat}
 BlockManager
 final boolean isCorrupt = numCorruptNodes == numNodes;
 {noformat}
 Would like to clarify if it is the intent to mark missing block as corrupted 
 or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold


[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180908#comment-14180908
 ] 

Hadoop QA commented on HDFS-6988:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676462/HDFS-6988.03.patch
  against trunk revision a36399e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8486//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8486//console

This message is automatically generated.

 Add configurable limit for percentage-based eviction threshold
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Fix For: 3.0.0

 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, 
 HDFS-6988.03.patch


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN


[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180958#comment-14180958
 ] 

Hadoop QA commented on HDFS-7278:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676481/HDFS-7278.002.patch
  against trunk revision 3b12fd6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8488//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8488//console

This message is automatically generated.

 Add a command that allows sysadmins to manually trigger full block reports 
 from a DN
 

 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7278.002.patch


 We should add a command that allows sysadmins to manually trigger full block 
 reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN


[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180962#comment-14180962
 ] 

Suresh Srinivas commented on HDFS-7278:
---

bq. I think it's a good tool to have in our toolbox to work around possible 
bugs in NN replica accounting.
Very interesting. I have not encountered such an issue. If you have details it 
would be good to share.

This command must be okay to add.

 Add a command that allows sysadmins to manually trigger full block reports 
 from a DN
 

 Key: HDFS-7278
 URL: https://issues.apache.org/jira/browse/HDFS-7278
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7278.002.patch


 We should add a command that allows sysadmins to manually trigger full block 
 reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7231) rollingupgrade needs some guard rails

[
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180312#comment-14180312
]

Suresh Srinivas edited comment on HDFS-7231 at 10/23/14 3:29 AM:
-

Allen, I just rewrote the steps with additional details to clarify:
# Upgrade 2.0.5 cluster to 2.2
# Do not -finalizeUpgrade
# Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1.
# Start namenode -upgrade option.
# Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress
# Leave 2.4.1 DNs running
# Install binaries on NN to 2.2
# Start NN on 2.2 with no upgrade related options

So far things are clear. Then you go on to say, the following:
bq. DNs now do a partial roll-forward, rendering them unable to continue
What do you mean by this?

bq. admins manually repair version files on those broken directories
This as you know is a recipe for disaster :)

Let me ask you a question. Before you go on to 2.4.1, if you do finalize of
upgrade what happens?

was (Author: sureshms):
Allen, I just rewrote the steps with additional details to clarify:
# Upgrade 2.0.5 cluster to 2.2
# Do not -finalizeUpgrade
# Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1.
# Start namenode -upgrade option.
# Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress
# Leave 2.4.1 DNs running
# Install binaries on NN to 2.2
# Start NN on 2.2 with no upgrade related options

So far things are clear. Then you go on to say, the following:
bq. DNs now do a partial roll-forward, rendering them unable to continue
What do you mean by this?

bq. admins manually repair version files on those broken directories
This is as you know is a recipe for disaster.

Let me ask you a question. Before you go on to 2.4.1, if you do finalize of
upgrade what happens?

rollingupgrade needs some guard rails
-

Key: HDFS-7231
URL: https://issues.apache.org/jira/browse/HDFS-7231
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Allen Wittenauer
Priority: Blocker

See first comment.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id