date:20141013

[
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169445#comment-14169445
]

Yongjun Zhang commented on HDFS-7146:
-

Hi [~aw] and [~brandonli],

Thanks for the earlier review and discussion here. I created HADOOP-11195 per
Allen's suggestion to merge the two existing mechanisms that caches user/group
info to the hadoop-common area.

Certainly I agree upon general software engineering principal of code sharing
and its benefits. My original thought was that we could do things in different
order. But let's try this route of fixing HADOOP-11195 first and then
HDFS-7146. I might incorporate HDFS-7146 in the same fix of HADOOP-11195 though.

NFS ID/Group lookup requires SSSD enumeration on the server
---

Key: HDFS-7146
URL: https://issues.apache.org/jira/browse/HDFS-7146
Project: Hadoop HDFS
Issue Type: Bug
Components: nfs
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch,
HDFS-7146.003.patch

The current implementation of the NFS UID and GID lookup works by running
'getent passwd' with an assumption that it will return the entire list of
users available on the OS, local and remote (AD/etc.).
This behaviour of the command is advised to be and is prevented by
administrators in most secure setups to avoid excessive load to the ADs
involved, as the # of users to be listed may be too large, and the repeated
requests of ALL users not present in the cache would be too much for the AD
infrastructure to bear.
The NFS server should likely do lookups based on a specific UID request, via
'getent passwd UID', if the UID does not match a cached value. This reduces
load on the LDAP backed infrastructure.
Thanks [~qwertymaniac] for reporting the issue.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon

2014-10-13 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169519#comment-14169519
 ] 

Benoy Antony commented on HDFS-7204:


[~aw] , 
Can we avoid setting {{daemon=true}}  for each component like balancer  ?
I may be missing the intention behind the internal boolean variable - 
{{daemon}}.  
Shouldn't it be set based on whether  {{--daemon}} is part of the invocation ?



 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
  Labels: newbie
 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log

2014-10-13 Thread Zoran Dimitrijevic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoran Dimitrijevic updated HDFS-7232:
-
Attachment: HDFS-7232.patch

 Populate hostname in httpfs audit log
 -

 Key: HDFS-7232
 URL: https://issues.apache.org/jira/browse/HDFS-7232
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zoran Dimitrijevic
Assignee: Zoran Dimitrijevic
Priority: Trivial
 Attachments: HDFS-7232.patch


 Currently httpfs audit logs do not log the request's IP address. Since they 
 use 
 hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties 
 which already contains hostname, it would be nice to add code to populate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log

2014-10-13 Thread Zoran Dimitrijevic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoran Dimitrijevic updated HDFS-7232:
-
Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

This is a simple patch for audit logs. I'm not sure if it'll require 
unit-tests, but for now I don't have them.

 Populate hostname in httpfs audit log
 -

 Key: HDFS-7232
 URL: https://issues.apache.org/jira/browse/HDFS-7232
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Zoran Dimitrijevic
Assignee: Zoran Dimitrijevic
Priority: Trivial
 Attachments: HDFS-7232.patch


 Currently httpfs audit logs do not log the request's IP address. Since they 
 use 
 hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties 
 which already contains hostname, it would be nice to add code to populate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-13 Thread Brandon Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169561#comment-14169561
 ] 

Brandon Li commented on HDFS-7146:
--

[~yzhangal], thanks for filing HADOOP-11195 to track the effort. It's good to 
not mix bug fixes and code improvement in the same JIRA. 

 NFS ID/Group lookup requires SSSD enumeration on the server
 ---

 Key: HDFS-7146
 URL: https://issues.apache.org/jira/browse/HDFS-7146
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
 HDFS-7146.003.patch


 The current implementation of the NFS UID and GID lookup works by running 
 'getent passwd' with an assumption that it will return the entire list of 
 users available on the OS, local and remote (AD/etc.).
 This behaviour of the command is advised to be and is prevented by 
 administrators in most secure setups to avoid excessive load to the ADs 
 involved, as the # of users to be listed may be too large, and the repeated 
 requests of ALL users not present in the cache would be too much for the AD 
 infrastructure to bear.
 The NFS server should likely do lookups based on a specific UID request, via 
 'getent passwd UID', if the UID does not match a cached value. This reduces 
 load on the LDAP backed infrastructure.
 Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6544) Broken Link for GFS in package.html


[ 
https://issues.apache.org/jira/browse/HDFS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169570#comment-14169570
 ] 

Hadoop QA commented on HDFS-6544:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650618/HDFS-6544.patch
  against trunk revision e8a31f2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8403//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8403//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8403//console

This message is automatically generated.

 Broken Link for GFS in package.html
 ---

 Key: HDFS-6544
 URL: https://issues.apache.org/jira/browse/HDFS-6544
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suraj Nayak M
Assignee: Suraj Nayak M
Priority: Minor
 Attachments: HDFS-6544.patch


 The link to GFS is currently pointing to 
 http://labs.google.com/papers/gfs.html, which is broken. Change it to 
 http://research.google.com/archive/gfs.html which has Abstract of the GFS 
 paper along with link to the PDF version of the GFS Paper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7232) Populate hostname in httpfs audit log

[
https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169571#comment-14169571
]

Hadoop QA commented on HDFS-7232:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12674532/HDFS-7232.patch
against trunk revision 793dbf2.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8404//console

This message is automatically generated.

Populate hostname in httpfs audit log
-

Key: HDFS-7232
URL: https://issues.apache.org/jira/browse/HDFS-7232
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Zoran Dimitrijevic
Assignee: Zoran Dimitrijevic
Priority: Trivial
Attachments: HDFS-7232.patch

Currently httpfs audit logs do not log the request's IP address. Since they
use
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties
which already contains hostname, it would be nice to add code to populate it.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7236) TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in trunk


[ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169574#comment-14169574
 ] 

Jing Zhao commented on HDFS-7236:
-

+1. I will commit the patch shortly.

 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in trunk
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 AND
 {code}
 2014-10-11 12:38:28,552 WARN  datanode.DataNode 
 (BPServiceActor.java:offerService(751)) - RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Got incremental

[jira] [Updated] (HDFS-7236) Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots


 [ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7236:

Summary: Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots  
(was: TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in 
trunk)

 Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 AND
 {code}
 2014-10-11 12:38:28,552 WARN  datanode.DataNode 
 (BPServiceActor.java:offerService(751)) - RemoteException in offerService

[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server


[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169585#comment-14169585
 ] 

Yongjun Zhang commented on HDFS-7146:
-

Hi [~brandonli], thanks for the feedback, your point is well taken.


 NFS ID/Group lookup requires SSSD enumeration on the server
 ---

 Key: HDFS-7146
 URL: https://issues.apache.org/jira/browse/HDFS-7146
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
 HDFS-7146.003.patch


 The current implementation of the NFS UID and GID lookup works by running 
 'getent passwd' with an assumption that it will return the entire list of 
 users available on the OS, local and remote (AD/etc.).
 This behaviour of the command is advised to be and is prevented by 
 administrators in most secure setups to avoid excessive load to the ADs 
 involved, as the # of users to be listed may be too large, and the repeated 
 requests of ALL users not present in the cache would be too much for the AD 
 infrastructure to bear.
 The NFS server should likely do lookups based on a specific UID request, via 
 'getent passwd UID', if the UID does not match a cached value. This reduces 
 load on the LDAP backed infrastructure.
 Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7207) libhdfs3 should not expose exceptions in public C++ API


[ 
https://issues.apache.org/jira/browse/HDFS-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169588#comment-14169588
 ] 

Haohui Mai commented on HDFS-7207:
--

bq.  it has no return value, what if the std::bad_alloc is throw in 
constructor? 

The interface can throw no exceptions at all, even {{std::bad_alloc}}. A static 
call is sufficient to take care of it. For example:

{code}
static Status Create(FileSystem **fsptr);
{code}

Note that the {{Status}} object also allows getting the users to get the 
information in the form of strings.

bq.  but a shared_ptr of DBImpl is still need to keep in Iterator to avoid the 
core dump if the user continue to use the iterator after delete db

The expected behavior is to crash right at the line of {{delete db}}. It avoids 
any uses of dangling iterators. Obviously the code needs to keep a refcount 
somewhere, but that way the code does not need to expose {{std::shared_ptr}} in 
the interface.


 libhdfs3 should not expose exceptions in public C++ API
 ---

 Key: HDFS-7207
 URL: https://issues.apache.org/jira/browse/HDFS-7207
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-7207.001.patch


 There are three major disadvantages of exposing exceptions in the public API:
 * Exposing exceptions in public APIs forces the downstream users to be 
 compiled with {{-fexceptions}}, which might be infeasible in many use cases.
 * It forces other bindings to properly handle all C++ exceptions, which might 
 be infeasible especially when the binding is generated by tools like SWIG.
 * It forces the downstream users to properly handle all C++ exceptions, which 
 can be cumbersome as in certain cases it will lead to undefined behavior 
 (e.g., throwing an exception in a destructor is undefined.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6544) Broken Link for GFS in package.html


[ 
https://issues.apache.org/jira/browse/HDFS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169593#comment-14169593
 ] 

Haohui Mai commented on HDFS-6544:
--

+1. The tests and release audit warnings are unrelated. I'll commit it shortly.

 Broken Link for GFS in package.html
 ---

 Key: HDFS-6544
 URL: https://issues.apache.org/jira/browse/HDFS-6544
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suraj Nayak M
Assignee: Suraj Nayak M
Priority: Minor
 Attachments: HDFS-6544.patch


 The link to GFS is currently pointing to 
 http://labs.google.com/papers/gfs.html, which is broken. Change it to 
 http://research.google.com/archive/gfs.html which has Abstract of the GFS 
 paper along with link to the PDF version of the GFS Paper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7236) Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots


 [ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7236:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-2 and branch-2.6.0. Thanks for the 
contribution, [~yzhangal]!

 Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.6.0

 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 AND
 {code}
 2014-10-11 12:38:28,552 WARN

[jira] [Commented] (HDFS-7236) Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots


[ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169600#comment-14169600
 ] 

Hudson commented on HDFS-7236:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6249 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6249/])
HDFS-7236. Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots. 
Contributed by Yongjun Zhang. (jing9: rev 
98ac9f26c5b3bceb073ce444e42dc89d19132a1f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.6.0

 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at

[jira] [Updated] (HDFS-7236) Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots


 [ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7236:

Affects Version/s: 2.6.0

 Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.6.0

 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 AND
 {code}
 2014-10-11 12:38:28,552 WARN  datanode.DataNode 
 (BPServiceActor.java:offerService(751)) - RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Got incremental 
 block report from

[jira] [Updated] (HDFS-7236) Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots


 [ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7236:

Target Version/s: 2.6.0

 Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.6.0

 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 AND
 {code}
 2014-10-11 12:38:28,552 WARN  datanode.DataNode 
 (BPServiceActor.java:offerService(751)) - RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Got incremental 
 block report from unregistered

[jira] [Updated] (HDFS-6544) Broken Link for GFS in package.html


 [ 
https://issues.apache.org/jira/browse/HDFS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6544:
-
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~snayakm] for the 
contribution.

 Broken Link for GFS in package.html
 ---

 Key: HDFS-6544
 URL: https://issues.apache.org/jira/browse/HDFS-6544
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suraj Nayak M
Assignee: Suraj Nayak M
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6544.patch


 The link to GFS is currently pointing to 
 http://labs.google.com/papers/gfs.html, which is broken. Change it to 
 http://research.google.com/archive/gfs.html which has Abstract of the GFS 
 paper along with link to the PDF version of the GFS Paper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7222) Expose DataNode network errors as a metric


 [ 
https://issues.apache.org/jira/browse/HDFS-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7222:
---
Attachment: HDFS-7222.001.patch

Attaching some diffs for a test run.


 Expose DataNode network errors as a metric
 --

 Key: HDFS-7222
 URL: https://issues.apache.org/jira/browse/HDFS-7222
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7222.001.patch


 It would be useful to track datanode network errors and expose them as a 
 metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6544) Broken Link for GFS in package.html


[ 
https://issues.apache.org/jira/browse/HDFS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169614#comment-14169614
 ] 

Hudson commented on HDFS-6544:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6250 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6250/])
HDFS-6544. Broken Link for GFS in package.html. Contributed by Suraj Nayak M. 
(wheat9: rev 53100318ea20c53c4d810dedfd50b88f9f32c1dc)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/package.html


 Broken Link for GFS in package.html
 ---

 Key: HDFS-6544
 URL: https://issues.apache.org/jira/browse/HDFS-6544
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suraj Nayak M
Assignee: Suraj Nayak M
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6544.patch


 The link to GFS is currently pointing to 
 http://labs.google.com/papers/gfs.html, which is broken. Change it to 
 http://research.google.com/archive/gfs.html which has Abstract of the GFS 
 paper along with link to the PDF version of the GFS Paper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon

[
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169617#comment-14169617
]

Allen Wittenauer commented on HDFS-7204:

[~benoyantony], I promise there is a method to the madness.

TL;DR: No, yes, no.

Longer:

In branch-2 and previous, daemons were handled via wrapping standard
command lines. If we concentrate on the functionality (vs. the code rot...)
this has some interesting (and inconsistent) results, especially around logging
and pid files. If you run the *-daemon.* version, you got a pid file and
hadoop.root.logger is set to be INFO,(something). When a daemon is run in
non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated
and hadoop.root.logger is kept as INFO,console. With no pid file generated, it
is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in
straight up mode again. It also means that one needs to pull apart the process
list to determine safely determine the status of the daemon since pid files
aren't always created. This made building custom init scripts fraught with
danger. This inconsistency has been a point of frustration for many operations
teams.

In branch-3/post-HADOOP-9902, there is a slight change in the above
functionality and one of the key reasons why this is an incompatible change.
Sub-commands that were intended to run as daemons (either fully, e.g., namenode
or partially, e.g. balancer) have all of this handling consolidated, helping to
eliminate code rot as well as providing a consistent user experience across
projects. daemon=true, which is a per-script local, but is consistent across
the hadoop sub-projects, tells the latter parts of the shell code that this
sub-command needs to have some extra-handling enabled beyond the normal
commands. In particular, daemon=true's will always get pid and out files. They
will prevent two being run on the same machine by the same user simultaneously
(see footnote 1, however). They get some extra options on the java command
line. Etc, etc.

So where does \-\-daemon come in? The value of that is stored in a global
called HADOOP_DAEMON_MODE. If the user doesn't set it specifically, it defaults
to 'default'. This was done to allow the code to mostly replicate the behavior
of branch-2 and previous when the *-daemon.sh code was NOT used. In other
words, \-\-daemon default (or no value provided), let's commands like hdfs
namenode still run in the foreground, just now with pid and out files.
\-\-daemon start does the disown (previously a nohup), change the logging
output from HADOOP_ROOT_LOGGER to HADOOP_DAEMON_ROOT_LOGGER, add some extra
command line options, etc, etc similar to the *-daemon.sh commands.

What happens if daemon mode is set for all commands? The big thing is the pid
and out file creation and the checks around it. A user would only ever be able
to execute one 'hadoop fs' command at a time because of the pid file! Less
than ideal. :)

To summarize, daemon=true tells the code that --daemon actually means something
to the sub-command. Otherwise, --daemon is ignored.

1-... unless HADOOP_IDENT_STRING is modified appropriately. This means that in
branch-3, it is now possible to run two secure datanodes on the same machine as
the same user, since all of the logs, pids, and outs, take that into
consideration! QA folks should be very happy. :)

balancer doesn't run as a daemon

Key: HDFS-7204
URL: https://issues.apache.org/jira/browse/HDFS-7204
Project: Hadoop HDFS
Issue Type: Bug
Components: scripts
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Labels: newbie
Attachments: HDFS-7204-01.patch, HDFS-7204.patch

From HDFS-7184, minor issues with balancer:
* daemon isn't set to true in hdfs to enable daemonization
* start-balancer script has usage instead of hadoop_usage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7204) balancer doesn't run as a daemon

[
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169617#comment-14169617
]

Allen Wittenauer edited comment on HDFS-7204 at 10/13/14 5:44 PM:
--

[~benoyantony], I promise there is a method to the madness.

TL;DR: No, yes, no.

Longer:

In branch-2 and previous, daemons were handled via wrapping standard
command lines. If we concentrate on the functionality (vs. the code rot...)
this has some interesting (and inconsistent) results, especially around logging
and pid files. If you run the *-daemon version, you got a pid file and
hadoop.root.logger is set to be INFO,(something). When a daemon is run in
non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated
and hadoop.root.logger is kept as INFO,console. With no pid file generated, it
is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in
straight up mode again. It also means that one needs to pull apart the process
list to determine safely determine the status of the daemon since pid files
aren't always created. This made building custom init scripts fraught with
danger. This inconsistency has been a point of frustration for many operations
teams.

To summarize, daemon=true tells the code that --daemon actually means something
to the sub-command. Otherwise, --daemon is ignored.

was (Author: aw):
[~benoyantony], I promise there is a method to the madness.

TL;DR: No, yes, no.

Longer:

[jira] [Updated] (HDFS-7204) balancer doesn't run as a daemon


 [ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7204:
---
Priority: Blocker  (was: Major)

 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
  Labels: newbie
 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7204) balancer doesn't run as a daemon


 [ 
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7204:
---
Affects Version/s: 3.0.0

 balancer doesn't run as a daemon
 

 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
  Labels: newbie
 Attachments: HDFS-7204-01.patch, HDFS-7204.patch


 From HDFS-7184, minor issues with balancer:
 * daemon isn't set to true in hdfs to enable daemonization
 * start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7204) balancer doesn't run as a daemon

[
https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169617#comment-14169617
]

Allen Wittenauer edited comment on HDFS-7204 at 10/13/14 5:47 PM:
--

[~benoyantony], I promise there is a method to the madness.

TL;DR: No, yes, no.

Longer:

In branch-2 and previous, daemons were handled via wrapping standard
command lines. If we concentrate on the functionality (vs. the code rot...)
this has some interesting (and inconsistent) results, especially around logging
and pid files. If you run the *-daemon version, you got a pid file and
hadoop.root.logger is set to be INFO,(something). When a daemon is run in
non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated
and hadoop.root.logger is kept as INFO,console. With no pid file generated, it
is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in
straight up mode again. It also means that one needs to pull apart the process
list to safely determine the status of the daemon since pid files aren't always
created. This made building custom init scripts fraught with danger. This
inconsistency has been a point of frustration for many operations teams.

To summarize, daemon=true tells the code that --daemon actually means something
to the sub-command. Otherwise, --daemon is ignored.

was (Author: aw):
[~benoyantony], I promise there is a method to the madness.

TL;DR: No, yes, no.

Longer:

In branch-2 and previous, daemons were handled via wrapping standard
command lines. If we concentrate on the functionality (vs. the code rot...)
this has some interesting (and inconsistent) results, especially around logging
and pid files. If you run the *-daemon version, you got a pid file and
hadoop.root.logger is set to be INFO,(something). When a daemon is run in
non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated
and hadoop.root.logger is kept as INFO,console. With no pid file generated, it
is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in
straight up mode again. It also means that one needs to pull apart the process
list to determine safely determine the status of the daemon since pid files
aren't always created. This made building custom init scripts fraught with
danger. This inconsistency has been a point of frustration for many operations
teams.

[jira] [Comment Edited] (HDFS-7231) rollingupgrade needs some guard rails


[ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167620#comment-14167620
 ] 

Allen Wittenauer edited comment on HDFS-7231 at 10/13/14 5:51 PM:
--

Argh: another point:  with namenode -finalize being taken away, this scenario 
is pretty much unsolvable without manual intervention and knowledge of how the 
NN stores stuff on disk.


was (Author: aw):
Argh: another point:  with namenode -finalize being taken away, this scenario 
is pretty much unsolvable.

 rollingupgrade needs some guard rails
 -

 Key: HDFS-7231
 URL: https://issues.apache.org/jira/browse/HDFS-7231
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer

 See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7231) rollingupgrade needs some guard rails


 [ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7231:
---
Priority: Blocker  (was: Major)

 rollingupgrade needs some guard rails
 -

 Key: HDFS-7231
 URL: https://issues.apache.org/jira/browse/HDFS-7231
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Allen Wittenauer
Priority: Blocker

 See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7231) rollingupgrade needs some guard rails


 [ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7231:
---
Affects Version/s: 2.6.0

 rollingupgrade needs some guard rails
 -

 Key: HDFS-7231
 URL: https://issues.apache.org/jira/browse/HDFS-7231
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Allen Wittenauer

 See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7231) rollingupgrade needs some guard rails


[ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169634#comment-14169634
 ] 

Allen Wittenauer commented on HDFS-7231:


Verified the same crappy experience exists in the 2.6 branch.  Marking this as 
a blocker since this will be the last release for everyone's precious JDK 1.6 
support.

I'd love to hear some options from the peanut gallery on how to improve this so 
users aren't left with a potential time bomb on their hands.  Alias -upgrade to 
-rollingupgrade?  Bring nn -finalize back?  Auto-finalize?

 rollingupgrade needs some guard rails
 -

 Key: HDFS-7231
 URL: https://issues.apache.org/jira/browse/HDFS-7231
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Allen Wittenauer
Priority: Blocker

 See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6884) Include the hostname in HTTPFS log filenames


 [ 
https://issues.apache.org/jira/browse/HDFS-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-6884:
---
Attachment: (was: HDFS-6884.patch)

 Include the hostname in HTTPFS log filenames
 

 Key: HDFS-6884
 URL: https://issues.apache.org/jira/browse/HDFS-6884
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Alejandro Abdelnur

 It'd be good to include the hostname in the httpfs log filenames. Right now 
 we have httpfs.log and httpfs-audit.log, it'd be nice to have e.g. 
 httpfs-${hostname}.log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HDFS-6884) Include the hostname in HTTPFS log filenames


 [ 
https://issues.apache.org/jira/browse/HDFS-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-6884:
---
Comment: was deleted

(was: {color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674259/HDFS-6884.patch
  against trunk revision d3d3d47.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
12 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/8395//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8395//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8395//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8395//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8395//console

This message is automatically generated.)

 Include the hostname in HTTPFS log filenames
 

 Key: HDFS-6884
 URL: https://issues.apache.org/jira/browse/HDFS-6884
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Alejandro Abdelnur

 It'd be good to include the hostname in the httpfs log filenames. Right now 
 we have httpfs.log and httpfs-audit.log, it'd be nice to have e.g. 
 httpfs-${hostname}.log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6884) Include the hostname in HTTPFS log filenames


[ 
https://issues.apache.org/jira/browse/HDFS-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169645#comment-14169645
 ] 

Allen Wittenauer commented on HDFS-6884:


(Previous patch and QA result was deleted by request of the contributor to 
prevent confusion, as it was intended for a different issue.)

 Include the hostname in HTTPFS log filenames
 

 Key: HDFS-6884
 URL: https://issues.apache.org/jira/browse/HDFS-6884
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Alejandro Abdelnur

 It'd be good to include the hostname in the httpfs log filenames. Right now 
 we have httpfs.log and httpfs-audit.log, it'd be nice to have e.g. 
 httpfs-${hostname}.log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7236) Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots


[ 
https://issues.apache.org/jira/browse/HDFS-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169647#comment-14169647
 ] 

Yongjun Zhang commented on HDFS-7236:
-

Many thanks [~jingzhao]!

FYI, I just took a look at HDFS-7226 (TestDNFencing.testQueueingWithAppend 
failed often in latest test) a bit and found that it seems to be related to 
HDFS-7217 change too. However, it's more subtle there, and it appears to have 
something to do with hflush. I will look more at that jira a bit later. 



 Fix TestOpenFilesWithSnapshot#testOpenFilesWithMultipleSnapshots
 

 Key: HDFS-7236
 URL: https://issues.apache.org/jira/browse/HDFS-7236
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.6.0

 Attachments: HDFS-7236.001.patch


 Per the following report
 {code}
 Recently FAILED builds in url: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport 
 (2014-10-11 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport 
 (2014-10-10 04:30:40)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
 ...
 Among 5 runs examined, all failed tests #failedRuns: testName:
 4: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 2: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 2: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: 
 org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
 ...
 {code}
 TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
 recent two runs in trunk. Creating this jira for it (The other two tests that 
 failed more often were reported in separate jira HDFS-7221 and HDFS-7226)
 Symptom:
 {code}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {code}
 AND
 {code}
 2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
 127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
 /127.0.0.1:32949 dst: /127.0.0.1:55303
 java.io.IOException: Premature EOF from inputStream
   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
   at

[jira] [Assigned] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test


 [ 
https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang reassigned HDFS-7226:
---

Assignee: Yongjun Zhang

 TestDNFencing.testQueueingWithAppend failed often in latest test
 

 Key: HDFS-7226
 URL: https://issues.apache.org/jira/browse/HDFS-7226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

 Using tool from HADOOP-11045, got the following report:
 {code}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 PreCommit-HDFS-Build -n 1 
 Recently FAILED builds in url: 
 https://builds.apache.org//job/PreCommit-HDFS-Build
 THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, 
 as listed below:
 ..
 Among 9 runs examined, all failed tests #failedRuns: testName:
 7: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 6: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
 3: 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen
 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching
 ..
 {code}
 TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. 
 Creating this jira for TestDNFencing.testQueueingWithAppend.
 Symptom:
 {code}
 Failed
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
 Failing for the past 1 build (Since Failed#8390 )
 Took 2.9 sec.
 Error Message
 expected:18 but was:12
 Stacktrace
 java.lang.AssertionError: expected:18 but was:12
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7090) Use unbuffered writes when persisting in-memory replicas

2014-10-13 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7090:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Xiaoyu pointed out that the test failures are unrelated.  
{{TestOpenFilesWithSnapshot}} just got fixed in HDFS-7236.  {{TestDNFencing}} 
is an existing failure tracked in HDFS-7226.

I committed this to trunk.  Xiaoyu, thank you for contributing the patch.

 Use unbuffered writes when persisting in-memory replicas
 

 Key: HDFS-7090
 URL: https://issues.apache.org/jira/browse/HDFS-7090
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Fix For: 3.0.0

 Attachments: HDFS-7090.0.patch, HDFS-7090.1.patch, HDFS-7090.2.patch, 
 HDFS-7090.3.patch, HDFS-7090.4.patch


 The LazyWriter thread just uses {{FileUtils.copyFile}} to copy block files to 
 persistent storage. It would be better to use unbuffered writes to avoid 
 churning page cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7090) Use unbuffered writes when persisting in-memory replicas


[ 
https://issues.apache.org/jira/browse/HDFS-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169657#comment-14169657
 ] 

Hudson commented on HDFS-7090:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6251 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6251/])
HDFS-7090. Use unbuffered writes when persisting in-memory replicas. 
Contributed by Xiaoyu Yao. (cnauroth: rev 
1770bb942f9ebea38b6811ba0bc3cc249ef3ccbb)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/Errno.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/errno_enum.c


 Use unbuffered writes when persisting in-memory replicas
 

 Key: HDFS-7090
 URL: https://issues.apache.org/jira/browse/HDFS-7090
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Fix For: 3.0.0

 Attachments: HDFS-7090.0.patch, HDFS-7090.1.patch, HDFS-7090.2.patch, 
 HDFS-7090.3.patch, HDFS-7090.4.patch


 The LazyWriter thread just uses {{FileUtils.copyFile}} to copy block files to 
 persistent storage. It would be better to use unbuffered writes to avoid 
 churning page cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169673#comment-14169673
 ] 

Colin Patrick McCabe commented on HDFS-7235:


Thanks for looking at this, Yongjun.

I don't understand why we need a new function named 
{{FsDatasetSpi#isInvalidBlockDueToNonexistentBlockFile}}.  The JavaDoc for 
{{FsDatasetSpi#isValid}} says that it checks if the block exist\[s\] and has 
the given state and it's clear from the code that this is what it actually 
implements.

We start by calling isValid...
{code}
  private void transferBlock(ExtendedBlock block, DatanodeInfo[] xferTargets,
  StorageType[] xferTargetStorageTypes) throws IOException {
BPOfferService bpos = getBPOSForBlock(block);
DatanodeRegistration bpReg = getDNRegistrationForBP(block.getBlockPoolId());

if (!data.isValidBlock(block)) {
  // block does not exist or is under-construction
  String errStr = Can't send invalid block  + block;
  LOG.info(errStr);

  bpos.trySendErrorReport(DatanodeProtocol.INVALID_BLOCK, errStr);
  return;
}
...
{code}

{{isValid}} checks whether the block file exists...
{code}
/** Does the block exist and have the given state? */   
 
  private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
   
final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(),   
   
b.getLocalBlock());
return replicaInfo != null  
   
 replicaInfo.getState() == state  
   
 replicaInfo.getBlockFile().exists();
  }   
{code}

So there's no need for a new function.  isValid already does what you want.

bq. The key issue we found here is, after DN detects an invalid block for the 
above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
know that the block is corrupted, and keeps sending the data transfer request 
to the same DN to be decommissioned, again and again. This caused an infinite 
loop, so the decommission process hangs.

Is this a problem with {{BPOfferService#trySendErrorReport}}?  If so, it seems 
like we should fix it there.

I can see that BPServiceActor#trySendErrorReport calls 
{{NameNodeRpc#errorReport}}, whereas your patch calls 
{{NameNodeRpc#reportBadBlocks}}.  What's the reason for this change, and does 
it fix the bug described above?

 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException

Tsz Wo Nicholas Sze created HDFS-7237:
-

 Summary: namenode -rollingUpgrade throws 
ArrayIndexOutOfBoundsException
 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


run hdfs namenode -rollingUpgrade
{noformat}
14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169714#comment-14169714
 ] 

Yongjun Zhang commented on HDFS-7235:
-

Hi Colin,

Thanks a lot for the review.

The key issue identified for the original symptom was, when a block is detected 
as invalid by the existing isValid() method, we call SendErrorReport() which 
just log a message there, and Namenode doesn't do more than logging the message 
for this call, so NameNode doesn't know the block is bad.

 What I did was, I separate the reasons for isValid to be false to two parts,
-  if it's false because getBlockFile().exists() , call reportBadBlocks, so 
NameNode will record the bad block for future reference.
-  if it's false because either replicaInfo == null OR replicaInfo.getState() 
!= state, it still calls SendErrorReport() like before. Actually for this case, 
the state has to be FINALIZED. We don't want to report badBlock for state 
that's RBW for example.

If we make the change in SendErrorReport, that means we need to change the 
behavior of this method, to also call reportBadBlocks from there conditionally, 
which is not clean to me, because SendErrorReport is supposed to just send 
error report.

Wonder if this explanation makes sense to you?

Thanks.








 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream


[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169768#comment-14169768
 ] 

Colin Patrick McCabe commented on HDFS-7055:


Nicholas, I apologize if these findbugs issues inconvenienced you.  I have 
filed HADOOP-11197 to make test-patch.sh more robust to issues like 
HADOOP-11178.  I would appreciate a review on HDFS-7227.

Thanks also to Yongjun for fixing HDFS-7194 (introduced by me) and HDFS-7169 
(introduced by Nicholas).

 Add tracing to DFSInputStream
 -

 Key: HDFS-7055
 URL: https://issues.apache.org/jira/browse/HDFS-7055
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7055.002.patch, HDFS-7055.003.patch, 
 HDFS-7055.004.patch, HDFS-7055.005.patch, screenshot-get-1mb.005.png, 
 screenshot-get-1mb.png


 Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


 [ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7237:
--
Attachment: h7237_20141013.patch

h7237_20141013.patch: checks the index.

 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch


 run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


 [ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7237:
--
Status: Patch Available  (was: Open)

 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch


 run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users

2014-10-13 Thread Maysam Yabandeh (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169779#comment-14169779
]

Maysam Yabandeh commented on HDFS-6982:
---

Thanks [~andrew.wang] for the well-detailed review. I will submit a new patch
soon. In the meanwhile, let me double check a couple of points with you.

bq. Since I don't see any modifications to any existing files, I'm also
wondering how this is exposed to JMX or on the webUI.

You are right. I was not sure where is the best place to integrate nntop with
nn. I will pick a place and we can update it later.

bq. There's only a {{getDefaultRollingWindow}} class, no other ways of
constructing a RollingWindow.

The design doc envisions two interfaces to access the top users. One is jmx
that requires rolling window over only one reporting period, say 1 minute. Jmx
data however are most useful when they are integrated with an external graphing
tool. To also allow users with small clusters to benefit from the data computed
by nntop, we also provide an html interface, which has no graphing capability.
This basic interface unfortunately does not give a sense of *trend* to the
viewer. To compensate for that, the html page will show the top users over
multiple time periods, say 1, 5, 25 minutes; ergo why we have multiple rolling
window periods in nntop. One of them however is used for jmx interface, which
is specific by {{getDefaultRollingWindow}}.

About the html interface, I excluded it from this patch for two reasons. First,
i figured it is better to keep this patch as small as possible and work on the
html interface patch on a separate jira. Second reason was that previously I
had used yarn html utils and I am gonna have to rewrite that part using html
utils which are standard to the hdfs project.

bq. How do we configure multiple reporting periods?

via some conf params. I will make sure that the docs reflect that properly.

bq. WEB_PORT and DEFAULT_WEB_PORT seem to be unused

you right. they are supposed to be used by the html interface. but I should
remove them from this patch.

bq. getCmdTotal and getTopMetricsRecordPrefix static getters are only used in
TopMetrics, that might be a better home.

they will later be used by the html interface as well. the html interface will
show the total operations on top and then details of each command afterwards.

bq. Rather than MIN_2_MS, could we have a long array with the default periods,
i.e. DEFAULT_REPORTING_PERIODS?

In addition to the previous explanation about multiple reporting periods for
the html view, I should add the them reporting periods are expected to be
specified in the conf file. I dropped the method that reads them from the conf
file from the patch since it was invoked only via the html interface. But I
guess I should put it back to avoid confusion.

bq. report, we construct the permStr, but don't actually use it.

you are right. I actually can drop src, dst, and also status. At the beginning
the vision for nntop was to also report hot directories, etc. and that is why
we kept the full details in the report method. but i guess we can always put
such details back if at some point those visions were to pursued.

bq. report, I don't think we need the catch for Throwable t, no checked
exceptions are being thrown?

the idea was that any unexpected problem from a programming bug in nntop should
not crash the name node.

bq. TopUtil: This stuff isn't shared much, seems like we could just move
things to where they're used

TopUtil was much fatter when it also included html view util functions. Also
html view will also be a user of TopUtil.

bq. TopMetricsCollector: Is this used?

yeah, by the html view. I should drop it from this patch.

nntop: top-like tool for name node users
-

Key: HDFS-6982
URL: https://issues.apache.org/jira/browse/HDFS-6982
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf

In this jira we motivate the need for nntop, a tool that, similarly to what
top does in Linux, gives the list of top users of the HDFS name node and
gives insight about which users are sending majority of each traffic type to
the name node. This information turns out to be the most critical when the
name node is under pressure and the HDFS admin needs to know which user is
hammering the name node and with what kind of requests. Here we present the
design of nntop which has been in production at Twitter in the past 10
months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K
nodes), low memory footprint (less than a few MB), and quite efficient for
the write path (only two hash lookup for updating a metric).

[jira] [Created] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout

Charles Lamb created HDFS-7238:
--

 Summary: TestOpenFilesWithSnapshot fails periodically with test 
timeout
 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


TestOpenFilesWithSnapshot fails periodically with this:

{noformat}

Error Message

Timed out waiting for Mini HDFS Cluster to start

Stacktrace

java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout


 [ 
https://issues.apache.org/jira/browse/HDFS-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7238:
---
Attachment: HDFS-7238.001.patch

It seems that adding a timeout (120s) argument to the @Tests in the file will 
fix this.

Attaching a patch for a jenkins run.

 TestOpenFilesWithSnapshot fails periodically with test timeout
 --

 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7238.001.patch


 TestOpenFilesWithSnapshot fails periodically with this:
 {noformat}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout


 [ 
https://issues.apache.org/jira/browse/HDFS-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7238:
---
Status: Patch Available  (was: Open)

 TestOpenFilesWithSnapshot fails periodically with test timeout
 --

 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7238.001.patch


 TestOpenFilesWithSnapshot fails periodically with this:
 {noformat}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7222) Expose DataNode network errors as a metric


 [ 
https://issues.apache.org/jira/browse/HDFS-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7222:
---
Status: Patch Available  (was: Open)

 Expose DataNode network errors as a metric
 --

 Key: HDFS-7222
 URL: https://issues.apache.org/jira/browse/HDFS-7222
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7222.001.patch


 It would be useful to track datanode network errors and expose them as a 
 metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7121) For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that the operation can succeed.


[ 
https://issues.apache.org/jira/browse/HDFS-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169832#comment-14169832
 ] 

Colin Patrick McCabe commented on HDFS-7121:


Sounds good.  Thanks for working on this.

 For JournalNode operations that must succeed on all nodes, execute a 
 pre-check to verify that the operation can succeed.
 

 Key: HDFS-7121
 URL: https://issues.apache.org/jira/browse/HDFS-7121
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: journal-node
Reporter: Chris Nauroth
Assignee: Chris Nauroth

 Several JournalNode operations are not satisfied by a quorum.  They must 
 succeed on every JournalNode in the cluster.  If the operation succeeds on 
 some nodes, but fails on others, then this may leave the nodes in an 
 inconsistent state and require operations to do manual recovery steps.  For 
 example, if {{doPreUpgrade}} succeeds on 2 nodes and fails on 1 node, then 
 the operator will need to correct the problem on the failed node and also 
 manually restore the previous.tmp directory to current on the 2 successful 
 nodes before reattempting the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout


[ 
https://issues.apache.org/jira/browse/HDFS-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169833#comment-14169833
 ] 

Jing Zhao commented on HDFS-7238:
-

Duplicate with HDFS-7236?

 TestOpenFilesWithSnapshot fails periodically with test timeout
 --

 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7238.001.patch


 TestOpenFilesWithSnapshot fails periodically with this:
 {noformat}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout


 [ 
https://issues.apache.org/jira/browse/HDFS-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7238:
---
Status: Open  (was: Patch Available)

 TestOpenFilesWithSnapshot fails periodically with test timeout
 --

 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7238.001.patch


 TestOpenFilesWithSnapshot fails periodically with this:
 {noformat}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout


 [ 
https://issues.apache.org/jira/browse/HDFS-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb resolved HDFS-7238.

Resolution: Duplicate

Duplicate of HDFS-7236

 TestOpenFilesWithSnapshot fails periodically with test timeout
 --

 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7238.001.patch


 TestOpenFilesWithSnapshot fails periodically with this:
 {noformat}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169850#comment-14169850
 ] 

Colin Patrick McCabe commented on HDFS-7235:


Thanks for explaining this.  If I understand correctly, you want blocks that 
are not in finalized state to cause {{trySendErrorReport}}, but blocks that 
don't exist or have the wrong length to cause {{reportBadBlocks}}.  That seems 
reasonable.  One improvement that I would suggest is that you don't need to add 
a new method to FsDatasetSpi to do that.  Just call {{FsDatasetSpi#getLength}}. 
 If the block doesn't exist, it will throw an IOException which you can catch.  
Patch looks good aside from that.

 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6824) Additional user documentation for HDFS encryption.

2014-10-13 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6824:
--
Attachment: hdfs-6824.002.patch

Thanks for reviewing Yi, good catches. New patch fixes all your comments.

 Additional user documentation for HDFS encryption.
 --

 Key: HDFS-6824
 URL: https://issues.apache.org/jira/browse/HDFS-6824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
 hdfs-6824.002.patch


 We'd like to better document additional things about HDFS encryption: setup 
 and configuration, using alternate access methods (namely WebHDFS and 
 HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


 [ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7237:
--
Description: 
Run hdfs namenode -rollingUpgrade
{noformat}
14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
{noformat}
Although the command is illegal (missing rolling upgrade startup option), it 
should print a better error message.

  was:
run hdfs namenode -rollingUpgrade
{noformat}
14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
{noformat}



 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch


 Run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}
 Although the command is illegal (missing rolling upgrade startup option), it 
 should print a better error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


 [ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7237:
--
Attachment: h7237_20141013b.patch

new StringBuilder('') does not work well since it is using the 
StringBuilder(int) constructor.

h7237_20141013b.patch: fixes the bug.

 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch, h7237_20141013b.patch


 Run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}
 Although the command is illegal (missing rolling upgrade startup option), it 
 should print a better error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7207) libhdfs3 should not expose exceptions in public C++ API

[
https://issues.apache.org/jira/browse/HDFS-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169882#comment-14169882
]

Colin Patrick McCabe commented on HDFS-7207:

bq. A slightly simpler (probably subjective) approach might be to wrap things
in the opposite way. That is, putting the error message / stack traces in the
Status object directly and let hdfsGetLastError to get the string. It avoids
copying the error message twice, once from the implementation to the TLS and
another from TLS to the returned Status object. What do you think?

I think we should definitely add {{hdfsGetLastError}} to the C API. There are
a lot of applications using the C API and a lot of them are going to continue
to do so for all the reasons we discussed earlier. This is an easy, 100%
backwards-compatible way to add richer error messages to the API.

I don't think copying an error message once is worth thinking about. Errors
are (or should be) rare. The overhead of throwing an exception is much larger
than copying a C string, and libhdfs3 currently throws exceptions in error
cases. So this is strictly an improvement from a performance point of view.

It also simplifies maintenance because we only have to worry about setting
error messages at one point. And it's the only way we can add richer error
messages in {{libhdfs}} and {{libwebhdfs}}. No other solution even comes close
to matching the advantages of {{hdfsGetLastError}}, in my opinion.

bq. If an Input / OutputStream is leaked than the corresponding FileSystem will
leak. I found the paradigm in leveldb quite helpful:
{code}
DB *db = DB::Open();
Iterator *it = db-(...);
delete db; // bails out because the iterator it has leaked.
{code}
bq. That might allow the user to be more aware of the leaks. Maybe we can do
something similar?

I might be missing something, but giving the user back a bare pointer seems
strictly less useful than giving the user back a shared_ptr. As [~wangzw]
pointed out, we still have to do refcounting either way, so there's no
performance improvement. If the user wants shared_ptr semantics and you give
them a bare pointer, they have to wrap it in another shared_ptr, adding
overhead. On the other hand, if the user doesn't want shared_ptr semantics,
there is no disadvantage to giving back a shared_ptr. The user can simply
delete the streams, then delete the filesystem, and get the same result as with
a bare pointer.

If leaks are a problem in a C++ program, there are tools like valgrind, ASAN,
and so forth. We use these tools a lot in Impala-- they work really well!
Throwing an exception in a delete() method is not really a very robust way of
detecting memory leaks. After all, the delete method itself may never be
called if the programmer makes a mistake.

Finally, to repeat my earlier argument, bare pointers are basically what the C
interface gives back. That is the C way-- manual allocation and de-allocation.
So if we're going to do the same here, it makes me wonder why we need a new
interface. I guess you could argue that it allows us to detect use-after-free,
but this is something that valgrind and ASAN do a great job detecting already,
and without runtime overhead in production.

libhdfs3 should not expose exceptions in public C++ API
---

Key: HDFS-7207
URL: https://issues.apache.org/jira/browse/HDFS-7207
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Colin Patrick McCabe
Priority: Blocker
Attachments: HDFS-7207.001.patch

There are three major disadvantages of exposing exceptions in the public API:
* Exposing exceptions in public APIs forces the downstream users to be
compiled with {{-fexceptions}}, which might be infeasible in many use cases.
* It forces other bindings to properly handle all C++ exceptions, which might
be infeasible especially when the binding is generated by tools like SWIG.
* It forces the downstream users to properly handle all C++ exceptions, which
can be cumbersome as in certain cases it will lead to undefined behavior
(e.g., throwing an exception in a destructor is undefined.)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking

2014-10-13 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169891#comment-14169891
 ] 

Jitendra Nath Pandey commented on HDFS-6919:


+1 for adding a release note for 2.6, and have it implemented in the follow on 
release.

 Enforce a single limit for RAM disk usage and replicas cached via locking
 -

 Key: HDFS-6919
 URL: https://issues.apache.org/jira/browse/HDFS-6919
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Arpit Agarwal
Assignee: Colin Patrick McCabe
Priority: Blocker

 The DataNode can have a single limit for memory usage which applies to both 
 replicas cached via CCM and replicas on RAM disk.
 See comments 
 [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
  
 [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
  and 
 [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
  for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users

2014-10-13 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169944#comment-14169944
]

Andrew Wang commented on HDFS-6982:
---

Hi Maysam,

It seems like without the HTML / JMX stuff, I missed out on a bunch of context
in my review.

As to a patch split, here's a suggestion. I believe our new HTML UI sources all
of its information by using the {{/jmx}} endpoint. This is good since it means
external tools can collect the same information without scraping our UIs. I
think a reasonable first patch would add {{/jmx}} output, since then we'll be
able to turn it on and add tests. Then, subsequent patches can add the HTML and
JS for the WebUI.

Alternatively, if you think it's manageable to review the entire patch, we
could try giving that a go. My guess though is that the top webpage is
currently not using {{/jmx}} though, so the above patch split would be the
fastest way to start getting things committed.

nntop: top-like tool for name node users
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169967#comment-14169967
 ] 

Yongjun Zhang commented on HDFS-7235:
-

Hi [~cmccabe], 

Thanks a lot for the input. Yes, I expect {{trySendErrorReport}} to be called 
when the blocks are not in finalized state, and {{reportBadBlocks}} to be 
called when the block file doesn't exist. 

To try what you suggested, when {{isValidBlock}} returns false, I still need to 
check the other conditions are true:
{code}
  replicaInfo != null   
  
 replicaInfo.getState() == FINALIZED   
{code}
Right now there is no method to get replicaInfo from the DataNode.java side, 
except a deprecated method
{code}
@Deprecated
  public Replica getReplica(String bpid, long blockId);
{code}
I will just call this method if it's ok to use. 




 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7239) Create a servlet for HDFS UI

Haohui Mai created HDFS-7239:


 Summary: Create a servlet for HDFS UI
 Key: HDFS-7239
 URL: https://issues.apache.org/jira/browse/HDFS-7239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently the HDFS UI gathers most of its information from JMX. There are a 
couple disadvantages:

* JMX is also used by management tools, thus Hadoop needs to maintain 
compatibility across minor releases.
* JMX organizes information as key, value pairs. The organization does not 
fit well with emerging use cases like startup progress report and nntop.

This jira proposes to introduce a new servlet in the NN for the purpose of 
serving information to the UI.

It should be viewed as a part of the UI. There is *no* compatibility guarantees 
for the output of the servlet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite


[ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169994#comment-14169994
 ] 

Suresh Srinivas commented on HDFS-7228:
---

[~jingzhao], is this policy placing all the replicas in SSD? Instead of that, 
should we place only one replica in SSD and the remaining in default storage? 
This may be better given SSD is more expensive than disk and may not be as 
abundant as disk? Applications can place their computation tasks closer to SSD 
replica (which is possible given block location now includes storage type).

 Add an SSD policy into the default BlockStoragePolicySuite
 --

 Key: HDFS-7228
 URL: https://issues.apache.org/jira/browse/HDFS-7228
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7228.000.patch


 Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
 policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
 the SSD storage type, it will be useful to also include a SSD related storage 
 policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite

[
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169994#comment-14169994
]

Suresh Srinivas edited comment on HDFS-7228 at 10/13/14 9:18 PM:
-

[~jingzhao], is this policy placing all the replicas in SSD? Instead of that,
should we place only one replica in SSD and the remaining in default storage?
This may be better given SSD is more expensive than disk and is not as abundant
as disk? Applications can place their computation tasks closer to SSD replica
(which is possible given block location now includes storage type).

was (Author: sureshms):
[~jingzhao], is this policy placing all the replicas in SSD? Instead of that,
should we place only one replica in SSD and the remaining in default storage?
This may be better given SSD is more expensive than disk and may not be as
abundant as disk? Applications can place their computation tasks closer to SSD
replica (which is possible given block location now includes storage type).

Add an SSD policy into the default BlockStoragePolicySuite
--

Key: HDFS-7228
URL: https://issues.apache.org/jira/browse/HDFS-7228
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Attachments: HDFS-7228.000.patch

Currently in the default BlockStoragePolicySuite, we've defined 4 storage
policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined
the SSD storage type, it will be useful to also include a SSD related storage
policy in the default suite.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1417#comment-1417
 ] 

Suresh Srinivas commented on HDFS-7237:
---

+1 for the patch. Thanks Nicholas for fixing this.

 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch, h7237_20141013b.patch


 Run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}
 Although the command is illegal (missing rolling upgrade startup option), it 
 should print a better error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7239) Create a servlet for HDFS UI

[
https://issues.apache.org/jira/browse/HDFS-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170037#comment-14170037
]

Suresh Srinivas commented on HDFS-7239:
---

[~wheat9], JMX interface was introduced to dissuade users from scraping
namenode web UI. Since then, anytime a namenode web UI change is introduced, we
have also added equivalent JMX interface method/functionality. Moving web UI to
use JMX is great to ensure all UI related APIs are available and is maintained
and independent UI can be built. If we move all such future functionality to a
new servlet, where does that leave JMX interface?

Create a servlet for HDFS UI

Key: HDFS-7239
URL: https://issues.apache.org/jira/browse/HDFS-7239
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai

Currently the HDFS UI gathers most of its information from JMX. There are a
couple disadvantages:
* JMX is also used by management tools, thus Hadoop needs to maintain
compatibility across minor releases.
* JMX organizes information as key, value pairs. The organization does not
fit well with emerging use cases like startup progress report and nntop.
This jira proposes to introduce a new servlet in the NN for the purpose of
serving information to the UI.
It should be viewed as a part of the UI. There is *no* compatibility
guarantees for the output of the servlet.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite


 [ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7228:

Attachment: HDFS-7228.001.patch

Thanks for the comments, Suresh! So in the new patch I just change the new 
policy to:
{code}
storageTypes=[SSD, DISK], 
creationFallbacks=[SSD, DISK],
replicationFallbacks=[SSD, DISK]
{code}
Thus the first replica will be placed in SSD, and the remaining will be on 
DISK. If the cluster is run out of SSD, then DISK is used for both block 
allocation or replica recovery. This policy also covers the scenario where DISK 
is unavailable (the policy falls back to SSD then) although it is usually rare 
in practice.

 Add an SSD policy into the default BlockStoragePolicySuite
 --

 Key: HDFS-7228
 URL: https://issues.apache.org/jira/browse/HDFS-7228
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch


 Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
 policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
 the SSD storage type, it will be useful to also include a SSD related storage 
 policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7056) Snapshot support for truncate


[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170060#comment-14170060
 ] 

Jing Zhao commented on HDFS-7056:
-

The proposed design looks pretty good to me. I agree we can copy the entire 
block list to file snapshot copy right now.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko

 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-6744) Improve decommissioning nodes and dead nodes access on the new NN webUI

2014-10-13 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned HDFS-6744:
-

Assignee: Siqi Li

 Improve decommissioning nodes and dead nodes access on the new NN webUI
 ---

 Key: HDFS-6744
 URL: https://issues.apache.org/jira/browse/HDFS-6744
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Siqi Li

 The new NN webUI lists live node at the top of the page, followed by dead 
 node and decommissioning node. From admins point of view:
 1. Decommissioning nodes and dead nodes are more interesting. It is better to 
 move decommissioning nodes to the top of the page, followed by dead nodes and 
 decommissioning nodes.
 2. To find decommissioning nodes or dead nodes, the whole page that includes 
 all nodes needs to be loaded. That could take some time for big clusters.
 The legacy web UI filters out the type of nodes dynamically. That seems to 
 work well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6744) Improve decommissioning nodes and dead nodes access on the new NN webUI

2014-10-13 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6744:
--
Attachment: HDFS-6744.v1.patch

 Improve decommissioning nodes and dead nodes access on the new NN webUI
 ---

 Key: HDFS-6744
 URL: https://issues.apache.org/jira/browse/HDFS-6744
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Siqi Li
 Attachments: HDFS-6744.v1.patch


 The new NN webUI lists live node at the top of the page, followed by dead 
 node and decommissioning node. From admins point of view:
 1. Decommissioning nodes and dead nodes are more interesting. It is better to 
 move decommissioning nodes to the top of the page, followed by dead nodes and 
 decommissioning nodes.
 2. To find decommissioning nodes or dead nodes, the whole page that includes 
 all nodes needs to be loaded. That could take some time for big clusters.
 The legacy web UI filters out the type of nodes dynamically. That seems to 
 work well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6744) Improve decommissioning nodes and dead nodes access on the new NN webUI

2014-10-13 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6744:
--
Status: Patch Available  (was: Open)

 Improve decommissioning nodes and dead nodes access on the new NN webUI
 ---

 Key: HDFS-6744
 URL: https://issues.apache.org/jira/browse/HDFS-6744
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Siqi Li
 Attachments: HDFS-6744.v1.patch


 The new NN webUI lists live node at the top of the page, followed by dead 
 node and decommissioning node. From admins point of view:
 1. Decommissioning nodes and dead nodes are more interesting. It is better to 
 move decommissioning nodes to the top of the page, followed by dead nodes and 
 decommissioning nodes.
 2. To find decommissioning nodes or dead nodes, the whole page that includes 
 all nodes needs to be loaded. That could take some time for big clusters.
 The legacy web UI filters out the type of nodes dynamically. That seems to 
 work well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170092#comment-14170092
 ] 

Hadoop QA commented on HDFS-7237:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674564/h7237_20141013.patch
  against trunk revision a56ea01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestRenameWhileOpen

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8405//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8405//console

This message is automatically generated.

 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch, h7237_20141013b.patch


 Run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}
 Although the command is illegal (missing rolling upgrade startup option), it 
 should print a better error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7222) Expose DataNode network errors as a metric


[ 
https://issues.apache.org/jira/browse/HDFS-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170093#comment-14170093
 ] 

Hadoop QA commented on HDFS-7222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674545/HDFS-7222.001.patch
  against trunk revision a56ea01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
  org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestRenameWhileOpen

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8406//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8406//console

This message is automatically generated.

 Expose DataNode network errors as a metric
 --

 Key: HDFS-7222
 URL: https://issues.apache.org/jira/browse/HDFS-7222
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7222.001.patch


 It would be useful to track datanode network errors and expose them as a 
 metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6824) Additional user documentation for HDFS encryption.


[ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170132#comment-14170132
 ] 

Hadoop QA commented on HDFS-6824:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674577/hdfs-6824.002.patch
  against trunk revision a56ea01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8409//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8409//console

This message is automatically generated.

 Additional user documentation for HDFS encryption.
 --

 Key: HDFS-6824
 URL: https://issues.apache.org/jira/browse/HDFS-6824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
 hdfs-6824.002.patch


 We'd like to better document additional things about HDFS encryption: setup 
 and configuration, using alternate access methods (namely WebHDFS and 
 HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7142) Implement a 2Q eviction strategy for HDFS-6581

[
https://issues.apache.org/jira/browse/HDFS-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170134#comment-14170134
]

Colin Patrick McCabe commented on HDFS-7142:

bq. The call to ceiling need not return the exact match. If you get a non-null
result you need a check for blockId and bpid match. Perhaps I misunderstood the
intention.

You're right... I need to check to make sure that the bpid and block id are the
same after getting back a result from {{ceiling}}. Fixed.

bq. dequeueNextReplicaToPersist appears to have a starvation issue. If replicas
in a higher blockPoolId keep getting added constantly a replica in a lower bpid
may wait indefinitely to get persisted. It would be good to persist replicas in
the same order in which they were originally added, you can do that with an
additional set.

My mistake here was looking at the lowest value in
{{replicasSortedByBlockPoolAndId}}. It should be looking at the lowest value
in {{replicasSortedByTierAndLastUsed}}. There's no starvation issue if it
looks at the set which is sorted by lastUsed, because oldest replicas will get
picked first. Fixed.

bq. Same issue with numReplicasNotPersisted, it should not count all replicas
in RAM. Let me clarify the documentation on
RamDiskReplicaTracker.dequeueNextReplicaToPersist.

It seems to me that the number of replicas not persisted *is* all the replicas
in RAM. So perhaps the function needs to be renamed. Can you clarify what
this should count?

bq. Colin Patrick McCabe, any comments and updates to the patch?

Let me repost a patch fixing the first two issues pointing out. I will wait
for clarification on any other API issues. I'm going to mark this as
targetting 2.7.

Implement a 2Q eviction strategy for HDFS-6581
--

Key: HDFS-7142
URL: https://issues.apache.org/jira/browse/HDFS-7142
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: 0002-Add-RamDiskReplica2QTracker.patch

We should implement a 2Q or approximate 2Q eviction strategy for HDFS-6581.
It is well known that LRU is a poor fit for scanning workloads, which HDFS
may often encounter.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7142) Implement a 2Q eviction strategy for HDFS-6581


[ 
https://issues.apache.org/jira/browse/HDFS-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170134#comment-14170134
 ] 

Colin Patrick McCabe edited comment on HDFS-7142 at 10/13/14 10:35 PM:
---

bq. The call to ceiling need not return the exact match. If you get a non-null 
result you need a check for blockId and bpid match. Perhaps I misunderstood the 
intention.

You're right... I need to check to make sure that the bpid and block id are the 
same after getting back a result from {{ceiling}}.  Fixed.

bq. dequeueNextReplicaToPersist appears to have a starvation issue. If replicas 
in a higher blockPoolId keep getting added constantly a replica in a lower bpid 
may wait indefinitely to get persisted. It would be good to persist replicas in 
the same order in which they were originally added, you can do that with an 
additional set.

My mistake here was looking at the lowest value in 
{{replicasSortedByBlockPoolAndId}}.  It should be looking at the lowest value 
in {{replicasSortedByTierAndLastUsed}}.  There's no starvation issue if it 
looks at the set which is sorted by lastUsed, because oldest replicas will get 
picked first.  Fixed.

bq. Same issue with numReplicasNotPersisted, it should not count all replicas 
in RAM. Let me clarify the documentation on 
RamDiskReplicaTracker.dequeueNextReplicaToPersist.

It seems to me that the number of replicas not persisted *is* all the replicas 
in RAM.  So perhaps the function needs to be renamed.  Can you clarify what 
this should count?

bq. Colin Patrick McCabe, any comments and updates to the patch?

Let me repost a patch fixing the first two issues pointed out.  I will wait for 
clarification on any other API issues.  I'm going to mark this as targetting 
2.7.


was (Author: cmccabe):
bq. The call to ceiling need not return the exact match. If you get a non-null 
result you need a check for blockId and bpid match. Perhaps I misunderstood the 
intention.

You're right... I need to check to make sure that the bpid and block id are the 
same after getting back a result from {{ceiling}}.  Fixed.

bq. dequeueNextReplicaToPersist appears to have a starvation issue. If replicas 
in a higher blockPoolId keep getting added constantly a replica in a lower bpid 
may wait indefinitely to get persisted. It would be good to persist replicas in 
the same order in which they were originally added, you can do that with an 
additional set.

My mistake here was looking at the lowest value in 
{{replicasSortedByBlockPoolAndId}}.  It should be looking at the lowest value 
in {{replicasSortedByTierAndLastUsed}}.  There's no starvation issue if it 
looks at the set which is sorted by lastUsed, because oldest replicas will get 
picked first.  Fixed.

bq. Same issue with numReplicasNotPersisted, it should not count all replicas 
in RAM. Let me clarify the documentation on 
RamDiskReplicaTracker.dequeueNextReplicaToPersist.

It seems to me that the number of replicas not persisted *is* all the replicas 
in RAM.  So perhaps the function needs to be renamed.  Can you clarify what 
this should count?

bq. Colin Patrick McCabe, any comments and updates to the patch?

Let me repost a patch fixing the first two issues pointing out.  I will wait 
for clarification on any other API issues.  I'm going to mark this as 
targetting 2.7.

 Implement a 2Q eviction strategy for HDFS-6581
 --

 Key: HDFS-7142
 URL: https://issues.apache.org/jira/browse/HDFS-7142
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 0002-Add-RamDiskReplica2QTracker.patch


 We should implement a 2Q or approximate 2Q eviction strategy for HDFS-6581.  
 It is well known that LRU is a poor fit for scanning workloads, which HDFS 
 may often encounter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7142) Implement a 2Q eviction strategy for HDFS-6581


 [ 
https://issues.apache.org/jira/browse/HDFS-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7142:
---
Attachment: HDFS-7142.003.patch

 Implement a 2Q eviction strategy for HDFS-6581
 --

 Key: HDFS-7142
 URL: https://issues.apache.org/jira/browse/HDFS-7142
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 0002-Add-RamDiskReplica2QTracker.patch, 
 HDFS-7142.003.patch


 We should implement a 2Q or approximate 2Q eviction strategy for HDFS-6581.  
 It is well known that LRU is a poor fit for scanning workloads, which HDFS 
 may often encounter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7238) TestOpenFilesWithSnapshot fails periodically with test timeout


[ 
https://issues.apache.org/jira/browse/HDFS-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170138#comment-14170138
 ] 

Hadoop QA commented on HDFS-7238:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674573/HDFS-7238.001.patch
  against trunk revision a56ea01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8407//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8407//console

This message is automatically generated.

 TestOpenFilesWithSnapshot fails periodically with test timeout
 --

 Key: HDFS-7238
 URL: https://issues.apache.org/jira/browse/HDFS-7238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7238.001.patch


 TestOpenFilesWithSnapshot fails periodically with this:
 {noformat}
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7142) Implement a 2Q eviction strategy for HDFS-6581


 [ 
https://issues.apache.org/jira/browse/HDFS-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7142:
---
Status: Patch Available  (was: In Progress)

 Implement a 2Q eviction strategy for HDFS-6581
 --

 Key: HDFS-7142
 URL: https://issues.apache.org/jira/browse/HDFS-7142
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 0002-Add-RamDiskReplica2QTracker.patch, 
 HDFS-7142.003.patch


 We should implement a 2Q or approximate 2Q eviction strategy for HDFS-6581.  
 It is well known that LRU is a poor fit for scanning workloads, which HDFS 
 may often encounter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7142) Implement a 2Q eviction strategy for HDFS-6581


 [ 
https://issues.apache.org/jira/browse/HDFS-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7142:
---
 Target Version/s: 2.7.0
Affects Version/s: 2.7.0

 Implement a 2Q eviction strategy for HDFS-6581
 --

 Key: HDFS-7142
 URL: https://issues.apache.org/jira/browse/HDFS-7142
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 0002-Add-RamDiskReplica2QTracker.patch, 
 HDFS-7142.003.patch


 We should implement a 2Q or approximate 2Q eviction strategy for HDFS-6581.  
 It is well known that LRU is a poor fit for scanning workloads, which HDFS 
 may often encounter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6744) Improve decommissioning nodes and dead nodes access on the new NN webUI


[ 
https://issues.apache.org/jira/browse/HDFS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170174#comment-14170174
 ] 

Hadoop QA commented on HDFS-6744:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674616/HDFS-6744.v1.patch
  against trunk revision 178bc50.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8411//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8411//console

This message is automatically generated.

 Improve decommissioning nodes and dead nodes access on the new NN webUI
 ---

 Key: HDFS-6744
 URL: https://issues.apache.org/jira/browse/HDFS-6744
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Siqi Li
 Attachments: HDFS-6744.v1.patch


 The new NN webUI lists live node at the top of the page, followed by dead 
 node and decommissioning node. From admins point of view:
 1. Decommissioning nodes and dead nodes are more interesting. It is better to 
 move decommissioning nodes to the top of the page, followed by dead nodes and 
 decommissioning nodes.
 2. To find decommissioning nodes or dead nodes, the whole page that includes 
 all nodes needs to be loaded. That could take some time for big clusters.
 The legacy web UI filters out the type of nodes dynamically. That seems to 
 work well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7208) NN doesn't schedule replication when a DN storage fails

2014-10-13 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7208:
--
Assignee: Ming Ma
  Status: Patch Available  (was: Open)

 NN doesn't schedule replication when a DN storage fails
 ---

 Key: HDFS-7208
 URL: https://issues.apache.org/jira/browse/HDFS-7208
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7208.patch


 We found the following problem. When a storage device on a DN fails, NN 
 continues to believe replicas of those blocks on that storage are valid and 
 doesn't schedule replication.
 A DN has 12 storage disks. So there is one blockReport for each storage. When 
 a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given 
 dfs.datanode.failed.volumes.tolerated is configured to be  0, NN still 
 considers that DN healthy.
 1. A disk failed. All blocks of that disk are removed from DN dataset.
  
 {noformat}
 2014-10-04 02:11:12,626 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing 
 replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume 
 /data/disk6/dfs/current
 {noformat}
 2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN 
 remove the DN and the replicas from the BlocksMap. In addition, blockReport 
 doesn't provide the diff given that is done per storage.
 {noformat}
 2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: 
 Disk error on DatanodeRegistration(xx.xx.xx.xxx, 
 datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075, 
 ipcPort=50020, 
 storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939):
  DataNode failed volumes:/data/disk6/dfs/current
 {noformat}
 3. Run fsck on the file and confirm the NN's BlocksMap still has that replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7208) NN doesn't schedule replication when a DN storage fails

2014-10-13 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ming Ma updated HDFS-7208:
--
Attachment: HDFS-7208.patch

Here is the initial patch based on heartbeat notification approach, the
assumption is DN will report all healthy storages in the heartbeat. This
approach is simpler than the blockReport approach which needs to have DN
persist the info to cover some failure scenarios. It also makes storage failure
detection faster.

1. NN detects failed storages during HB processing based on the delta between
DN's reported healthy storages and the storages NN has. Marked the state of
those missing storages DatanodeStorage.State.FAILED.

2. HeartbeatManager will remove blocks on those DatanodeStorage.State.FAILED
storages. This will cover some corner scenarios where new replicas might be
added to BlocksMap afterwards.

3. It also covers the case where admins reduce the number of healthy volumes on
DN and restart DN.

NN doesn't schedule replication when a DN storage fails
---

Key: HDFS-7208
URL: https://issues.apache.org/jira/browse/HDFS-7208
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Ming Ma
Attachments: HDFS-7208.patch

We found the following problem. When a storage device on a DN fails, NN
continues to believe replicas of those blocks on that storage are valid and
doesn't schedule replication.
A DN has 12 storage disks. So there is one blockReport for each storage. When
a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given
dfs.datanode.failed.volumes.tolerated is configured to be 0, NN still
considers that DN healthy.
1. A disk failed. All blocks of that disk are removed from DN dataset.

{noformat}
2014-10-04 02:11:12,626 WARN
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing
replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume
/data/disk6/dfs/current
{noformat}
2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN
remove the DN and the replicas from the BlocksMap. In addition, blockReport
doesn't provide the diff given that is done per storage.
{noformat}
2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode:
Disk error on DatanodeRegistration(xx.xx.xx.xxx,
datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075,
ipcPort=50020,
storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939):
DataNode failed volumes:/data/disk6/dfs/current
{noformat}
3. Run fsck on the file and confirm the NN's BlocksMap still has that replica.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/HDFS-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170184#comment-14170184
 ] 

Hadoop QA commented on HDFS-7237:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674581/h7237_20141013b.patch
  against trunk revision a56ea01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8408//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8408//console

This message is automatically generated.

 namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException
 --

 Key: HDFS-7237
 URL: https://issues.apache.org/jira/browse/HDFS-7237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7237_20141013.patch, h7237_20141013b.patch


 Run hdfs namenode -rollingUpgrade
 {noformat}
 14/10/13 11:30:50 INFO namenode.NameNode: createNameNode [-rollingUpgrade]
 14/10/13 11:30:50 FATAL namenode.NameNode: Exception in namenode join
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.parseArguments(NameNode.java:1252)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1367)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1501)
 14/10/13 11:30:50 INFO util.ExitUtil: Exiting with status 1
 {noformat}
 Although the command is illegal (missing rolling upgrade startup option), it 
 should print a better error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6745) Display the list of very-under-replicated blocks as well as the files on NN webUI

2014-10-13 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170203#comment-14170203
 ] 

Ming Ma commented on HDFS-6745:
---

At RPC layer, we can add a new method similar to 
ClientProtocol.listCorruptFileBlocks. Maybe the new method can take replication 
threshold as a parameter to retrieve all blocks below that threshold. Then 
ClientProtocol.listCorruptFileBlocks could become a special case of the new 
method.

 Display the list of very-under-replicated blocks as well as the files on NN 
 webUI
 ---

 Key: HDFS-6745
 URL: https://issues.apache.org/jira/browse/HDFS-6745
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma

 Sometimes admins want to know the list of very-under-replicated blocks 
 before major actions such as decommission; as these blocks are more likely to 
 turn into missing blocks. very-under-replicated blocks  are those blocks 
 with live replica count of 1 and replicator factor of = 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


 [ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7235:

Attachment: HDFS-7235.002.patch

 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7165) Separate block metrics for files with replication count 1

2014-10-13 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170237#comment-14170237
]

Andrew Wang commented on HDFS-7165:
---

Hi Zhe, generally looks good. A few review comments:

* Need javadoc on new MXBean method
* For naming, you could try MissingBlocksWithReplOne as a shorter alternative
to Phil's suggestion.
* One whitespace-only change in TestMissingBlocksAlert
* UnderReplicatedBlocks looks like a standalone class, so we happily might be
able to write some actual unit tests. TestUnderReplicatedBlockQueues has an
example. Would be good to test remove and update in addition to test, this will
simulate block deletion and setrep (up and down).
* TestUnderReplicatedBlockQueues also does something lazy and extends Assert
rather than doing the static imports, it'd be cool to fix this up too if you
edit this file.

Separate block metrics for files with replication count 1
-

Key: HDFS-7165
URL: https://issues.apache.org/jira/browse/HDFS-7165
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Andrew Wang
Assignee: Zhe Zhang
Attachments: HDFS-7165-20141003-v1.patch,
HDFS-7165-20141009-v1.patch, HDFS-7165-20141010-v1.patch

We see a lot of escalations because someone has written teragen output with a
replication factor of 1, a DN goes down, and a bunch of missing blocks show
up. These are normally false positives, since teragen output is disposable,
and generally speaking, users should understand this is true for all repl=1
files.
It'd be nice to be able to separate out these repl=1 missing blocks from
missing blocks with higher replication factors..

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7240) Object store in HDFS

2014-10-13 Thread Jitendra Nath Pandey (JIRA)

Jitendra Nath Pandey created HDFS-7240:
--

 Summary: Object store in HDFS
 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


This jira proposes to add object store capabilities into HDFS. 
As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk


[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170238#comment-14170238
 ] 

Yongjun Zhang commented on HDFS-7235:
-

Hi [~cmccabe],

Thanks for your earlier review. I just uploaded a new rev per what you 
suggested. (002)

There is one issue with this approach, the changed code in DataNode now kind of 
sees the FSdatasetImpl implemetation. But maybe it's fine.

BTW, since I need to get replicaInfo in DataNode, and I need to make sure the 
replica state is FINALIZED, I simply called the exists() method to check block 
file existence.

Thanks.





 Can not decommission DN which has invalid block due to bad disk
 ---

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-7215) Add gc log to NFS gateway

2014-10-13 Thread Brandon Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li reopened HDFS-7215:
--

 Add gc log to NFS gateway
 -

 Key: HDFS-7215
 URL: https://issues.apache.org/jira/browse/HDFS-7215
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li

 Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7215) Add gc log to NFS gateway

2014-10-13 Thread Brandon Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170252#comment-14170252
 ] 

Brandon Li commented on HDFS-7215:
--

Thanks, [~cmccabe]. I reopened the JIRA to add JvmpauseMonitor. Will also 
update the user guide for HADOOP_NFS3_OPTS.

 Add gc log to NFS gateway
 -

 Key: HDFS-7215
 URL: https://issues.apache.org/jira/browse/HDFS-7215
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li

 Like NN/DN, a GC log would help debug issues in NFS gateway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite


[ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170304#comment-14170304
 ] 

Hadoop QA commented on HDFS-7228:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674609/HDFS-7228.001.patch
  against trunk revision 178bc50.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8410//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8410//console

This message is automatically generated.

 Add an SSD policy into the default BlockStoragePolicySuite
 --

 Key: HDFS-7228
 URL: https://issues.apache.org/jira/browse/HDFS-7228
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch


 Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
 policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
 the SSD storage type, it will be useful to also include a SSD related storage 
 policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7230) Support rolling downgrade


[ 
https://issues.apache.org/jira/browse/HDFS-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170306#comment-14170306
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7230:
---

Same as downgrade, rolling downgrade requires the same NAMENODE_LAYOUT_VERSION 
and the same DATANODE_LAYOUT_VERSION.  Although there is no layout change, a 
cluster may not be downgraded using the same rolling upgrade procedure since 
protocols may change in a backward compatible manner but not forward 
compatible, i.e. old DNs can talk to the new NNs but new DNs may not be able to 
talk the old NNs.


 Support rolling downgrade
 -

 Key: HDFS-7230
 URL: https://issues.apache.org/jira/browse/HDFS-7230
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze

 HDFS-5535 made a lot of improvement on rolling upgrade.  It also added the 
 cluster downgrade feature.  However, the downgrade described in HDFS-5535 
 requires cluster downtime.  In this JIRA, we discuss how to do rolling 
 downgrade, i.e. downgrade without downtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite


[ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170321#comment-14170321
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7228:
---

- I think we need both all-SSD and one-SSD policies.  All-SSD is useful for 
high performance applications.
- Since storage policy is not released yet, let's renumber the policy IDs.  
Otherwise, the upper IDs become all used.
- Do you also want to add constants for the IDs?

 Add an SSD policy into the default BlockStoragePolicySuite
 --

 Key: HDFS-7228
 URL: https://issues.apache.org/jira/browse/HDFS-7228
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch


 Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
 policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
 the SSD storage type, it will be useful to also include a SSD related storage 
 policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6824) Additional user documentation for HDFS encryption.

2014-10-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170323#comment-14170323
 ] 

Yi Liu commented on HDFS-6824:
--

Thanks Andrew for updating the patch.  You are right, I see for the second 
comment you fix it as {{HDFS user will not have access to unencrypted 
encryption keys}}. 

 Additional user documentation for HDFS encryption.
 --

 Key: HDFS-6824
 URL: https://issues.apache.org/jira/browse/HDFS-6824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, 
 hdfs-6824.002.patch


 We'd like to better document additional things about HDFS encryption: setup 
 and configuration, using alternate access methods (namely WebHDFS and 
 HttpFS), other misc improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7230) Support rolling downgrade

[
https://issues.apache.org/jira/browse/HDFS-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170329#comment-14170329
]

Tsz Wo Nicholas Sze commented on HDFS-7230:
---

Here is the Rolling Downgrade procedure. Suppose a rolling upgrade is in
progress in a a HA cluster.
# Downgrade DNs
## Choose a small subset of datanodes (e.g. all datanodes under a particular
rack).
### Run hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade to
shutdown one of the chosen datanodes.
### Run hdfs dfsadmin -getDatanodeInfo DATANODE_HOST:IPC_PORT to check and
wait for the datanode to shutdown.
### Downgrade and restart the datanode.
### Perform the above steps for all the chosen datanodes in the subset in
parallel.
## Repeat the above steps until all datanodes in the cluster are downgraded.
# Downgrade Active and Standby NNs: NN1 is active and NN2 is standby.
## Shutdown and downgrade NN2.
## Start NN2 as standby (the “-rollingUpgrade downgrade” option is not needed)
## Failover from NN1 to NN2 so that NN2 becomes active and NN1 becomes standby.
## Shutdown and downgrade NN1.
## Start NN1 as standby (the “-rollingUpgrade downgrade” option is not needed).
# Finalize
## Run hdfs dfsadmin -rollingUpgrade finalize to finalize the procedure.

Support rolling downgrade
-

Key: HDFS-7230
URL: https://issues.apache.org/jira/browse/HDFS-7230
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze

HDFS-5535 made a lot of improvement on rolling upgrade. It also added the
cluster downgrade feature. However, the downgrade described in HDFS-5535
requires cluster downtime. In this JIRA, we discuss how to do rolling
downgrade, i.e. downgrade without downtime.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite


 [ 
https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7228:

Attachment: HDFS-7228.002.patch

Thanks for the review, Nicholas! Update the patch to address your comments.

 Add an SSD policy into the default BlockStoragePolicySuite
 --

 Key: HDFS-7228
 URL: https://issues.apache.org/jira/browse/HDFS-7228
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch, 
 HDFS-7228.002.patch


 Currently in the default BlockStoragePolicySuite, we've defined 4 storage 
 policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined 
 the SSD storage type, it will be useful to also include a SSD related storage 
 policy in the default suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7237) namenode -rollingUpgrade throws ArrayIndexOutOfBoundsException