[jira] [Commented] (HDFS-7018) Implement C interface for libhdfs3

2014-12-17 Thread Zhanwei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251256#comment-14251256
 ] 

Zhanwei Wang commented on HDFS-7018:


Hi [~wheat9] and [~cmccabe]

Would you please review the new patch? Thanks.

> Implement C interface for libhdfs3
> --
>
> Key: HDFS-7018
> URL: https://issues.apache.org/jira/browse/HDFS-7018
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch
>
>
> Implement C interface for libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7018) Implement C interface for libhdfs3

2014-12-17 Thread Zhanwei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanwei Wang updated HDFS-7018:
---
Attachment: HDFS-7018-pnative.002.patch

> Implement C interface for libhdfs3
> --
>
> Key: HDFS-7018
> URL: https://issues.apache.org/jira/browse/HDFS-7018
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch
>
>
> Implement C interface for libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-17 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7527:

Attachment: HDFS-7527.002.patch

> TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
> ---
>
> Key: HDFS-7527
> URL: https://issues.apache.org/jira/browse/HDFS-7527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: Yongjun Zhang
>Assignee: Binglin Chang
> Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
> {quote}
> Error Message
> test timed out after 36 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 36 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
> (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
> the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
> datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
> infoSecurePort=0, ipcPort=43726, 
> storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
> 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
> (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565. Exiting. 
> java.io.IOException: DN shut down before block pool connected
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
>   at java.lang.Thread.run(Thread.java:745)
> {quote}
> Found by tool proposed in HADOOP-11045:
> {quote}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> Hadoop-Hdfs-trunk -n 5 | tee bt.log
> Recently FAILED builds in url: 
> https://builds.apache.org//job/Hadoop-Hdfs-trunk
> THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
> as listed below:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
> (2014-12-15 03:30:01)
> Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
> Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
> (2014-12-13 10:32:27)
> Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
> (2014-12-13 03:30:01)
> Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
> (2014-12-11 03:30:01)
> Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
> Failed test: 
> org.apache.hadoop.hdfs.server.n

[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-17 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251244#comment-14251244
 ] 

Binglin Chang commented on HDFS-7527:
-

Make sense, looks like the behavior is changed at some point. 
Update the patch to partially support dfs.datanode.hostname(if it is an ip 
address, or the hostname resolve to a proper ip address). 
And add change to test to properly wait for the excluded datanode become back 
again(using Datanode.isDatanodeFullyStarted rather than checking ALIVE node 
count).
Note that too fully restore the old behavior requires a lot more changes, 
currently I only made minimal changes.

> TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
> ---
>
> Key: HDFS-7527
> URL: https://issues.apache.org/jira/browse/HDFS-7527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: Yongjun Zhang
>Assignee: Binglin Chang
> Attachments: HDFS-7527.001.patch
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
> {quote}
> Error Message
> test timed out after 36 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 36 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
> (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
> the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
> datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
> infoSecurePort=0, ipcPort=43726, 
> storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
> 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
> (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565. Exiting. 
> java.io.IOException: DN shut down before block pool connected
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
>   at java.lang.Thread.run(Thread.java:745)
> {quote}
> Found by tool proposed in HADOOP-11045:
> {quote}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> Hadoop-Hdfs-trunk -n 5 | tee bt.log
> Recently FAILED builds in url: 
> https://builds.apache.org//job/Hadoop-Hdfs-trunk
> THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
> as listed below:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
> (2014-12-15 03:30:01)
> Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
> Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
> (2014-12-13 10:32:27)
> Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegist

[jira] [Commented] (HDFS-7373) Clean up temporary files after fsimage transfer failures

2014-12-17 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251147#comment-14251147
 ] 

Akira AJISAKA commented on HDFS-7373:
-

+1 (binding).

> Clean up temporary files after fsimage transfer failures
> 
>
> Key: HDFS-7373
> URL: https://issues.apache.org/jira/browse/HDFS-7373
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-7373.patch
>
>
> When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in 
> each storage directory.  If the size of name space is large, these files can 
> take up quite a bit of space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251112#comment-14251112
 ] 

Hadoop QA commented on HDFS-7530:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687861/HDFS-7530.003.patch
  against trunk revision 9937eef.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9068//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9068//console

This message is automatically generated.

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
> HDFS-7530.003.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails

2014-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251106#comment-14251106
 ] 

Colin Patrick McCabe commented on HDFS-7443:


It appears that the old software could sometimes create a duplicate copy of the 
same block in two different {{subdir}} folders on the same volume.  In all the 
cases in which we've seen this, the block files were identical.  Two files, 
both for the same block id, in separate directories.  This appears to be a bug, 
since obviously we don't want to store the same block twice on the same volume. 
 This causes the {{EEXIST}} problem on upgrade, since the new block layout only 
has one place where each block ID can go.  Unfortunately, the hardlink code 
doesn't print the name of the file which caused the problem, making diagnosis 
more difficult than it should be.

One easy way around this is to check for duplicate block IDs on each volume 
before upgrading, and manually remove the duplicates.

We should also consider logging an error message and continuing the upgrade 
process when we encounter this.

[~kihwal], I'm not sure why, in your case, the DataNode retried the hard link 
process multiple times.  I'm also not sure why you ended up with a jumbled 
{{previous.tmp}} directory.  When we reproduced this on CDH5.2, we did not have 
that problem, for whatever reason.

> Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
> 
>
> Key: HDFS-7443
> URL: https://issues.apache.org/jira/browse/HDFS-7443
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Priority: Blocker
>
> When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
> datanodes were not coming up.  They treid data file layout upgrade for 
> BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
> All failures were caused by {{NativeIO.link()}} throwing IOException saying 
> {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
> retried when the block pool initialization was retried whenever 
> {{BPServiceActor}} was registering with the namenode.  After many retries, 
> datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
> no {{VERSION}} file in the block pool slice storage directory.  
> Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
> in the new layout and the subdirs were all newly created ones.  This 
> shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
> removes {{current}} and renames {{previous.tmp}} to {{current}} before 
> retrying.  All successfully upgraded volumes had old state preserved in their 
> {{previous}} directory.
> In summary there were two observed issues.
> - Upgrade failure with {{link()}} failing with {{EEXIST}}
> - {{previous.tmp}} contained not the content of original {{current}}, but 
> half-upgraded one.
> We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251093#comment-14251093
 ] 

Hadoop QA commented on HDFS-7543:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687868/HDFS-7543.000.patch
  against trunk revision 9937eef.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestPread
  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9067//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9067//console

This message is automatically generated.

> Avoid path resolution when getting FileStatus for audit logs
> 
>
> Key: HDFS-7543
> URL: https://issues.apache.org/jira/browse/HDFS-7543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7543.000.patch
>
>
> The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
>  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
> avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
> the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7539) Namenode can't leave safemode because of Datanodes' IPC socket timeout

2014-12-17 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251085#comment-14251085
 ] 

Suresh Srinivas commented on HDFS-7539:
---

I think NN jvm parameters should be configured correctly. That said, DNs 
continue to reconnect, right?

> Namenode can't leave safemode because of Datanodes' IPC socket timeout
> --
>
> Key: HDFS-7539
> URL: https://issues.apache.org/jira/browse/HDFS-7539
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.5.1
> Environment: 1 master, 1 secondary and 128 slaves, each node has x24 
> cores, 48GB memory. fsimage is 4GB.
>Reporter: hoelog
>
> During the starting of namenode, data nodes seem waiting namenode's response 
> through IPC to register block pools.
> here is DN's log -
> {code} 
> 2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Acknowledging ACTIVE Namenode Block pool 
> BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid 
> 2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 
> {code}
> But namenode is too busy to responde it, and datanodes occur socket timeout - 
> default is 1 minute.
> {code}
> 2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to 
> NN.x.com:9000 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/10.x.x.84:57924 remote=NN.x.com/10.x.x.143:9000]; For more details 
> see:  http://wiki.apache.org/hadoop/SocketTimeout 
> {code}
> same events repeat and eventually NN drops most connecting trials from DN. So 
> NN can't leave safemode.
> DN's log -
> {code}
> 2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.io.IOException: failed on local exception java.io.ioexception connection 
> reset by peer
> {code}
> There is no troubles in the network, configuration or servers. I think NN is 
> too busy to respond to DN in a minute. 
> I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that 
> was helpful for my cluster. 
> {code}
> 
>   ipc.ping.interval
>   90
> 
> {code}
> In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request.
> It will be helpful if there is more elegant solution.
> {code}
> 2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Acknowledging ACTIVE Namenode Block pool 
> BP-877672386-10.x.x.143-1412666752827 (Datanode Uuid 
> c4f7beea-b8e9-404f-bc81-6e87e37263d2) service to NN/10.x.x.143:9000
> 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Sent 1 blockreports 2090961 blocks total. Took 1690 msec to generate and 
> 193738 msecs for RPC and NN processing.  Got back commands 
> org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@20e68e11
> 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Got finalize command for block pool BP-877672386-10.x.x.143-1412666752827
> 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: Computing capacity 
> for map BlockMap
> 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: VM type   = 
> 64-bit
> 2014-12-16 23:31:32,044 INFO org.apache.hadoop.util.GSet: 0.5% max memory 3.6 
> GB = 18.2 MB
> 2014-12-16 23:31:32,045 INFO org.apache.hadoop.util.GSet: capacity  = 
> 2^21 = 2097152 entries
> 2014-12-16 23:31:32,046 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block 
> Verification Scanner initialized with interval 504 hours for block pool 
> BP-877672386-10.114.130.143-1412666752827
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7539) Namenode can't leave safemode because of Datanodes' IPC socket timeout

2014-12-17 Thread hoelog (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251065#comment-14251065
 ] 

hoelog commented on HDFS-7539:
--

Actually NN hangs 1~2 minutes because of GC.
This problem may not appear when NN have enough memory.

> Namenode can't leave safemode because of Datanodes' IPC socket timeout
> --
>
> Key: HDFS-7539
> URL: https://issues.apache.org/jira/browse/HDFS-7539
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.5.1
> Environment: 1 master, 1 secondary and 128 slaves, each node has x24 
> cores, 48GB memory. fsimage is 4GB.
>Reporter: hoelog
>
> During the starting of namenode, data nodes seem waiting namenode's response 
> through IPC to register block pools.
> here is DN's log -
> {code} 
> 2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Acknowledging ACTIVE Namenode Block pool 
> BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid 
> 2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 
> {code}
> But namenode is too busy to responde it, and datanodes occur socket timeout - 
> default is 1 minute.
> {code}
> 2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to 
> NN.x.com:9000 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/10.x.x.84:57924 remote=NN.x.com/10.x.x.143:9000]; For more details 
> see:  http://wiki.apache.org/hadoop/SocketTimeout 
> {code}
> same events repeat and eventually NN drops most connecting trials from DN. So 
> NN can't leave safemode.
> DN's log -
> {code}
> 2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.io.IOException: failed on local exception java.io.ioexception connection 
> reset by peer
> {code}
> There is no troubles in the network, configuration or servers. I think NN is 
> too busy to respond to DN in a minute. 
> I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that 
> was helpful for my cluster. 
> {code}
> 
>   ipc.ping.interval
>   90
> 
> {code}
> In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request.
> It will be helpful if there is more elegant solution.
> {code}
> 2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Acknowledging ACTIVE Namenode Block pool 
> BP-877672386-10.x.x.143-1412666752827 (Datanode Uuid 
> c4f7beea-b8e9-404f-bc81-6e87e37263d2) service to NN/10.x.x.143:9000
> 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Sent 1 blockreports 2090961 blocks total. Took 1690 msec to generate and 
> 193738 msecs for RPC and NN processing.  Got back commands 
> org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@20e68e11
> 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Got finalize command for block pool BP-877672386-10.x.x.143-1412666752827
> 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: Computing capacity 
> for map BlockMap
> 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: VM type   = 
> 64-bit
> 2014-12-16 23:31:32,044 INFO org.apache.hadoop.util.GSet: 0.5% max memory 3.6 
> GB = 18.2 MB
> 2014-12-16 23:31:32,045 INFO org.apache.hadoop.util.GSet: capacity  = 
> 2^21 = 2097152 entries
> 2014-12-16 23:31:32,046 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block 
> Verification Scanner initialized with interval 504 hours for block pool 
> BP-877672386-10.114.130.143-1412666752827
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251036#comment-14251036
 ] 

Hadoop QA commented on HDFS-7529:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687849/HDFS-7529.001.patch
  against trunk revision 0da1330.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9065//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9065//console

This message is automatically generated.

> Consolidate encryption zone related implementation into a single class
> --
>
> Key: HDFS-7529
> URL: https://issues.apache.org/jira/browse/HDFS-7529
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch
>
>
> This jira proposes to consolidate encryption zone related implementation to a 
> single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250973#comment-14250973
 ] 

Hadoop QA commented on HDFS-7528:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687844/HDFS-7530.003.patch
  against trunk revision 316613b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDecommission

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9064//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9064//console

This message is automatically generated.

> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.7.0
>
> Attachments: HDFS-7528.000.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250970#comment-14250970
 ] 

Hadoop QA commented on HDFS-7544:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687882/HDFS-7544.001.patch
  against trunk revision 3b173d9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ha.TestZKFailoverControllerStress

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9069//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9069//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9069//console

This message is automatically generated.

> ChunkedArrayList: fix removal via iterator and implement get
> 
>
> Key: HDFS-7544
> URL: https://issues.apache.org/jira/browse/HDFS-7544
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7544.001.patch
>
>
> ChunkedArrayList: implement removal via iterator and get.  Previously, 
> calling remove on a ChunkedArrayList iterator would cause the returned size 
> to be incorrect later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250967#comment-14250967
 ] 

Hadoop QA commented on HDFS-7542:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687843/HDFS-7542.001.patch
  against trunk revision 316613b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestRollingUpgradeRollback

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.fs.TesTests
org.apache.hadoop.h>

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9063//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9063//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9063//console

This message is automatically generated.

> Add an option to DFSAdmin -safemode wait to ignore connection failures
> --
>
> Key: HDFS-7542
> URL: https://issues.apache.org/jira/browse/HDFS-7542
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Attachments: HDFS-7542.001.patch
>
>
> Currently, the _dfsadmin -safemode wait_ command aborts when connection to 
> the NN fails (network glitch, ConnectException when NN is unreachable, 
> EOFException if network link shut down). 
> In certain situations, users have asked for an option to make the command 
> resilient to connection failures. This is useful so that the admin can 
> initiate the wait command despite the NN not being fully up or survive 
> intermittent network issues. With this option, the admin can rely on the wait 
> command continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7431) log message for InvalidMagicNumberException may be incorrect

2014-12-17 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250957#comment-14250957
 ] 

Yi Liu commented on HDFS-7431:
--

The test failure is not related.

> log message for InvalidMagicNumberException may be incorrect
> 
>
> Key: HDFS-7431
> URL: https://issues.apache.org/jira/browse/HDFS-7431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-7431.001.patch, HDFS-7431.002.patch, 
> HDFS-7431.003.patch
>
>
> For security mode, HDFS now supports that Datanodes don't require root or 
> jsvc if {{dfs.data.transfer.protection}} is configured.
> Log message for {{InvalidMagicNumberException}}, we miss one case: 
> when the datanodes run on unprivileged port and 
> {{dfs.data.transfer.protection}} is configured to {{authentication}} but 
> {{dfs.encrypt.data.transfer}} is not configured. SASL handshake is required 
> and a low version dfs client is used, then {{InvalidMagicNumberException}} is 
> thrown and we write log:
> {quote}
> Failed to read expected encryption handshake from client at  Perhaps the 
> client is running an older version of Hadoop which does not support encryption
> {quote}
> Recently I run HDFS built on trunk and security is enabled, but the client is 
> 2.5.1 version. Then I got the above log message, but actually I have not 
> configured encryption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7461) Reduce impact of laggards on Mover

2014-12-17 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250945#comment-14250945
 ] 

Arpit Agarwal commented on HDFS-7461:
-

Hi Benoy, do you have a prototype/rough patch?

> Reduce impact of laggards on Mover
> --
>
> Key: HDFS-7461
> URL: https://issues.apache.org/jira/browse/HDFS-7461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: continuousmovement.pdf
>
>
> The current Mover logic is as follows :
> {code}
>  for (Path target : targetPaths) {
> hasRemaining |= processPath(target.toUri().getPath());
>   }
> // wait for pending move to finish and retry the failed migration
> hasRemaining |= Dispatcher.waitForMoveCompletion(storages.targets.values());
> {code}
> The _processPath_ will schedule moves, but it is bounded by the number of 
> concurrent moves (default is 5 per node} .  Once block moves are scheduled,  
> it will wait for ALL  scheduled moves to finish in _waitForMoveCompletion_.
> One slow move could keep the Mover  idle for a long time. 
> It will be a  performance improvement to schedule the next moves as soon as 
> any (source , target) slot is available instead of waiting for all the 
> scheduled moves to finish. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-17 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250936#comment-14250936
 ] 

Lei (Eddy) Xu commented on HDFS-7531:
-

Thanks for the reviews, [~cmccabe] and [~wheat9]!

> Improve the concurrent access on FsVolumeList
> -
>
> Key: HDFS-7531
> URL: https://issues.apache.org/jira/browse/HDFS-7531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
> HDFS-7531.002.patch
>
>
> {{FsVolumeList}} uses {{synchronized}} to protect the update on 
> {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
> {{getAvailable()}}) iterate {{volumes}} without protection.
> This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
> provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250912#comment-14250912
 ] 

Hudson commented on HDFS-7531:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6743 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6743/])
HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. 
McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java


> Improve the concurrent access on FsVolumeList
> -
>
> Key: HDFS-7531
> URL: https://issues.apache.org/jira/browse/HDFS-7531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
> HDFS-7531.002.patch
>
>
> {{FsVolumeList}} uses {{synchronized}} to protect the update on 
> {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
> {{getAvailable()}}) iterate {{volumes}} without protection.
> This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
> provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7545) Data striping support in HDFS client

2014-12-17 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-7545:
---

 Summary: Data striping support in HDFS client
 Key: HDFS-7545
 URL: https://issues.apache.org/jira/browse/HDFS-7545
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
 Attachments: DataStripingSupportinHDFSClient.pdf

Data striping is a commonly used data layout with critical benefits in the 
context of erasure coding. This JIRA aims to extend HDFS client to work with 
striped blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7545) Data striping support in HDFS client

2014-12-17 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7545:

Attachment: DataStripingSupportinHDFSClient.pdf

> Data striping support in HDFS client
> 
>
> Key: HDFS-7545
> URL: https://issues.apache.org/jira/browse/HDFS-7545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
> Attachments: DataStripingSupportinHDFSClient.pdf
>
>
> Data striping is a commonly used data layout with critical benefits in the 
> context of erasure coding. This JIRA aims to extend HDFS client to work with 
> striped blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7531:
---
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

+1.  Thanks, Eddy

> Improve the concurrent access on FsVolumeList
> -
>
> Key: HDFS-7531
> URL: https://issues.apache.org/jira/browse/HDFS-7531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: 2.7.0
>
> Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
> HDFS-7531.002.patch
>
>
> {{FsVolumeList}} uses {{synchronized}} to protect the update on 
> {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
> {{getAvailable()}}) iterate {{volumes}} without protection.
> This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
> provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250898#comment-14250898
 ] 

Andrew Wang commented on HDFS-7544:
---

Also looks like a nice change, +1 pending

> ChunkedArrayList: fix removal via iterator and implement get
> 
>
> Key: HDFS-7544
> URL: https://issues.apache.org/jira/browse/HDFS-7544
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7544.001.patch
>
>
> ChunkedArrayList: implement removal via iterator and get.  Previously, 
> calling remove on a ChunkedArrayList iterator would cause the returned size 
> to be incorrect later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7544:
---
Status: Patch Available  (was: Open)

> ChunkedArrayList: fix removal via iterator and implement get
> 
>
> Key: HDFS-7544
> URL: https://issues.apache.org/jira/browse/HDFS-7544
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7544.001.patch
>
>
> ChunkedArrayList: implement removal via iterator and get.  Previously, 
> calling remove on a ChunkedArrayList iterator would cause the returned size 
> to be incorrect later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7544:
---
Attachment: HDFS-7544.001.patch

> ChunkedArrayList: fix removal via iterator and implement get
> 
>
> Key: HDFS-7544
> URL: https://issues.apache.org/jira/browse/HDFS-7544
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7544.001.patch
>
>
> ChunkedArrayList: implement removal via iterator and get.  Previously, 
> calling remove on a ChunkedArrayList iterator would cause the returned size 
> to be incorrect later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get

2014-12-17 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7544:
--

 Summary: ChunkedArrayList: fix removal via iterator and implement 
get
 Key: HDFS-7544
 URL: https://issues.apache.org/jira/browse/HDFS-7544
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


ChunkedArrayList: implement removal via iterator and get.  Previously, calling 
remove on a ChunkedArrayList iterator would cause the returned size to be 
incorrect later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250854#comment-14250854
 ] 

Colin Patrick McCabe commented on HDFS-7411:


* Can you rebase this on trunk?  ChunkedArrayList has moved and this caused a
patch application failure.

* I would really prefer that size stay a O(1) operation for ChunkedArrayList.
We should be able to do this by hooking into the iterator's remove() method,
creating a custom iterator if needed.  If that's too complex to do in this
jira, then let's at least file a follow-on.

{code}

  dfs.namenode.decommission.blocks.per.node
  40
  The approximate number of blocks per node. 
  This affects the number of blocks processed per 
  decommission interval, as defined in 
dfs.namenode.decommission.interval.
  This is multiplied by dfs.namenode.decommission.nodes.per.interval
  to define the actual processing rate.

{code}

* Why do we need this parameter?  The NameNode already tracks how many blocks
each DataNode has in each storage.  That information is in
DatanodeStorageInfo#size.

{code}

  dfs.namenode.decommission.max.concurrent.tracked.nodes
  100
  
The maximum number of decommission-in-progress datanodes nodes that 
will be
tracked at one time by the namenode. Tracking a decommission-in-progress
datanode consumes additional NN memory proportional to the number of 
blocks
on the datnode. Having a conservative limit reduces the potential impact
of decomissioning a large number of nodes at once.
  

{code}

* Should this be called something like
dfs.namenode.decomission.max.concurrent.nodes?  I'm confused by the mention of
"tracking" here.  It seems to imply that setting this too low would allow more
nodes to be decommissioned, but we'd stop tracking the decomissioning?

{code}
-  static final Log LOG = LogFactory.getLog(BlockManager.class);
+  static final Logger LOG = LoggerFactory.getLogger(BlockManager.class);
{code}
If you're going to change this, you need to change all the unit tests that are
changing the log level, so that they use the correct function to do so.

{code}
hdoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java:

((Log4JLogger)LogFactory.getLog(BlockManager.class)).getLogger().setLevel(Level.ALL);
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java:

((Log4JLogger)LogFactory.getLog(BlockManager.class)).getLogger().setLevel(Level.ALL);
and many more...
{code}
I can see that you fixed it in TestPendingInvalidateBlock.java, but there's a 
lot more locations.

You probably need something like the GenericTestUtils#disableLog function I
created in the HDFS-7430 patch.  I guess we could split that off into a
separate patch if it's important enough.  Or perhaps we could just put off
changing this until a follow-on JIRA?

{code}
if (node.isAlive) {
  return true;
} else {
  ... long block ...
}
{code}
We can reduce the indentation by getting rid of the else block here.  Similar
with the other nested 'else'.

{code}
-  LOG.fatal("ReplicationMonitor thread received Runtime exception. ", 
t);
+  LOG.error("ReplicationMonitor thread received Runtime exception. ",
+  t);
{code}
What's the rationale for changing the log level here?

{code}
   /**
-   * Decommission the node if it is in exclude list.
+   * Decommission the node if it is in the host exclude list.
+   *
+   * @param nodeReg datanode
*/
-  private void checkDecommissioning(DatanodeDescriptor nodeReg) { 
+  void checkDecommissioning(DatanodeDescriptor nodeReg) {
{code}

I realize this isn't introduced by this patch, but this function seems
misleadingly named.  Perhaps it should be named something like
"startDecomissioningIfExcluded"?  It's definitely not just a "check."

more comments coming...

> Refactor and improve decommissioning logic into DecommissionManager
> ---
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250855#comment-14250855
 ] 

Charles Lamb commented on HDFS-7530:


Thanks for the review and the kick in the head to Mr. Jenkins. 

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
> HDFS-7530.003.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-17 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250848#comment-14250848
 ] 

Charles Lamb commented on HDFS-7529:


Hi @wheat9,

While the .001 patch fixes the formatting issues, the larger problem is that by 
calling provider.getMetadata() inside the lock, you're doing an RPC (inside the 
lock). While it is true that you may have been able to contact the KMS during 
ensureKeysAreInitialized, that may not be true when you try later and there can 
be an arbitrarily long delay. BTW, there's a plurality mismatch between 
ensureKeysAreInitialized (plural) and the method it calls 
(generateEncryptedDataEncryptionKey, which is singular).

Charles


> Consolidate encryption zone related implementation into a single class
> --
>
> Key: HDFS-7529
> URL: https://issues.apache.org/jira/browse/HDFS-7529
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch
>
>
> This jira proposes to consolidate encryption zone related implementation to a 
> single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250845#comment-14250845
 ] 

Andrew Wang commented on HDFS-7530:
---

Patch changes looks good, I'll rekick Jenkins and commit if it comes back clean.

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
> HDFS-7530.003.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7543:
-
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-7508

> Avoid path resolution when getting FileStatus for audit logs
> 
>
> Key: HDFS-7543
> URL: https://issues.apache.org/jira/browse/HDFS-7543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7543.000.patch
>
>
> The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
>  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
> avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
> the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-12-17 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250839#comment-14250839
 ] 

Yongjun Zhang commented on HDFS-6833:
-

Hi [~sinchii],

Thanks for your new rev and sorry for late response. It looks good to me except 
for two minor things that you can take care of after getting committer's review:

*  {{public int getNumDeletingBlocks(String bpid)}} is not used anywhere. 
Consider removing. It might be possible that we need to have such an util in 
the future, if so, the method need to be implemented in the ReplicaMap 
protected with the internal mutex.

*   About {{if (m != null) {}} in {{void removeBlocks(String bpid, Set 
blockIds)}}, it's better to check if m is null and return if so right after 
getting m, instead of doing the check again and again in the loop. Or you can 
put the loop within {{if (m != null) {...}.}}

HI [~cnauroth] and [~szetszwo], thanks for your earlier review. Wonder if any 
of you would have time to take a look at the latest? thanks much.





> DirectoryScanner should not register a deleting block with memory of DataNode
> -
>
> Key: HDFS-6833
> URL: https://issues.apache.org/jira/browse/HDFS-6833
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.5.0, 2.5.1
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Critical
> Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, 
> HDFS-6833-12.patch, HDFS-6833-6-2.patch, HDFS-6833-6-3.patch, 
> HDFS-6833-6.patch, HDFS-6833-7-2.patch, HDFS-6833-7.patch, HDFS-6833.8.patch, 
> HDFS-6833.9.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, 
> HDFS-6833.patch, HDFS-6833.patch
>
>
> When a block is deleted in DataNode, the following messages are usually 
> output.
> {code}
> 2014-08-07 17:53:11,606 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>  for deletion
> 2014-08-07 17:53:11,617 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> However, DirectoryScanner may be executed when DataNode deletes the block in 
> the current implementation. And the following messsages are output.
> {code}
> 2014-08-07 17:53:30,519 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>  for deletion
> 2014-08-07 17:53:31,426 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
> files:0, missing block files:0, missing blocks in memory:1, mismatched 
> blocks:0
> 2014-08-07 17:53:31,426 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
> missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
>   getNumBytes() = 21230663
>   getBytesOnDisk()  = 21230663
>   getVisibleLength()= 21230663
>   getVolume()   = /hadoop/data1/dfs/data/current
>   getBlockFile()= 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>   unlinked  =false
> 2014-08-07 17:53:31,531 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> Deleting block information is registered in DataNode's memory.
> And when DataNode sends a block report, NameNode receives wrong block 
> information.
> For example, when we execute recommission or change the number of 
> replication, NameNode may delete the right block as "ExcessReplicate" by this 
> problem.
> And "Under-Replicated Blocks" and "Missing Blocks" occur.
> When DataNode run DirectoryScanner, DataNode should not register a deleting 
> block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7509) Avoid resolving path multiple times

2014-12-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250835#comment-14250835
 ] 

Konstantin Shvachko commented on HDFS-7509:
---

Hey Jing.
Looks like your commit message for this issue incorrectly references HDFS-7059 
on both commits.

> Avoid resolving path multiple times
> ---
>
> Key: HDFS-7509
> URL: https://issues.apache.org/jira/browse/HDFS-7509
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.7.0
>
> Attachments: HDFS-7509.000.patch, HDFS-7509.001.patch, 
> HDFS-7509.002.patch, HDFS-7509.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7543:
-
Status: Patch Available  (was: Open)

> Avoid path resolution when getting FileStatus for audit logs
> 
>
> Key: HDFS-7543
> URL: https://issues.apache.org/jira/browse/HDFS-7543
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7543.000.patch
>
>
> The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
>  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
> avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
> the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7543:
-
Attachment: HDFS-7543.000.patch

> Avoid path resolution when getting FileStatus for audit logs
> 
>
> Key: HDFS-7543
> URL: https://issues.apache.org/jira/browse/HDFS-7543
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7543.000.patch
>
>
> The current API of {{getAuditFileInfo()}} forces parsing the paths again when 
>  generating the {{HdfsFileStatus}} for audit logs. This jira proposes to 
> avoid the repeated parsing by passing the {{INodesInPath}} object instead of 
> the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs

2014-12-17 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7543:


 Summary: Avoid path resolution when getting FileStatus for audit 
logs
 Key: HDFS-7543
 URL: https://issues.apache.org/jira/browse/HDFS-7543
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


The current API of {{getAuditFileInfo()}} forces parsing the paths again when  
generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid 
the repeated parsing by passing the {{INodesInPath}} object instead of the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250832#comment-14250832
 ] 

Andrew Wang commented on HDFS-7430:
---

Cool, looks like you hit a lot of these. I did another review pass:

Nits: 
* DFSConfigKeys, I agree the spacing is erratic in this files, but adding some 
spaces to line up the variable names would agree with the variables immediately 
around the new keys
* Still need javadoc {{}} tags in a lot of places. It's not a big deal, so 
if you do another pass and think it looks fine we can leave it.
* TestFsDatasetImpl, FsVolumeImpl, FsDatasetSpi, FsDatasetImpl unused imports
* @VisibleForTesting could be added to 
BlockScanner#Conf#INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD...
* Still some lines longer than 80 chars

Some more time conversions that could be done with TimeUnit:
* VolumeScanner#positiveMsToHours, the else case
* testScanRateImpl

FsDatasetImpl
* I'd still like to use JSON to save the iterator :) Pretty sure Jackson can 
pretty print it for you.
* I also still like the iterator-of-iterators idea a lot, since we could 
probably use the same iterator implementation at each level. Iterating would be 
simpler, the serde would be harder, but overall I think simpler code and more 
friendly for Java programmers.
* BlockIterator still implements Closeable, unnecessary?

VolumeScanner

{code}
  // Find out how many bytes per second we should scan.
  long neededBytesPerSec =
conf.targetBytesPerSec - (scannedBytesSum / MINUTES_PER_HOUR);
{code}

Still mismatched?

* Guessing the JDK7 file listing goodness is coming in the next patch, since 
it's still using File#list

Tests:
* Did you look into the failed test I posted earlier? Any RCA?
* The bugs found in my previous review seem worth unit testing, e.g the OBO 
with the binarySearch index and the neededBytesPerSec that still looks off, the 
{{<=}} in place of {{<}} that affected continuous scans. Might be fun trying to 
write some actual stripped down unit tests, rather than poking with a full 
minicluster.

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> -
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250828#comment-14250828
 ] 

Hadoop QA commented on HDFS-7531:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687810/HDFS-7531.002.patch
  against trunk revision f2d150e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9061//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9061//console

This message is automatically generated.

> Improve the concurrent access on FsVolumeList
> -
>
> Key: HDFS-7531
> URL: https://issues.apache.org/jira/browse/HDFS-7531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
> HDFS-7531.002.patch
>
>
> {{FsVolumeList}} uses {{synchronized}} to protect the update on 
> {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
> {{getAvailable()}}) iterate {{volumes}} without protection.
> This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
> provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6662) [ UI ] Not able to open file from UI if file path contains "%"

2014-12-17 Thread Gerson Carlos (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerson Carlos updated HDFS-6662:

Attachment: hdfs-6662.001.patch

> [ UI ] Not able to open file from UI if file path contains "%"
> --
>
> Key: HDFS-6662
> URL: https://issues.apache.org/jira/browse/HDFS-6662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1
>Reporter: Brahma Reddy Battula
>Priority: Critical
> Attachments: hdfs-6662.001.patch, hdfs-6662.patch
>
>
> 1. write a file into HDFS is such a way that, file name is like 1%2%3%4
> 2. using NameNode UI browse the file
> throwing following Exception.
> "Path does not exist on HDFS or WebHDFS is disabled. Please check your path 
> or enable WebHDFS"
> HBase write its WAL  files data in HDFS using % contains in file name
> eg: 
> /hbase/WALs/HOST-,60020,1404731504691/HOST-***-130%2C60020%2C1404731504691.1404812663950.meta
>  
> the above file info is not opening in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6662) [ UI ] Not able to open file from UI if file path contains "%"

2014-12-17 Thread Gerson Carlos (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250805#comment-14250805
 ] 

Gerson Carlos commented on HDFS-6662:
-

Thanks Haohui for noticing it.

In fact, I had to add {{encodeURIComponent()}} with some adjustments, because 
it encodes even the separator {{/}}, thus broking the URI. But now it treats 
the slash and other reserved characters (&, =, +, for example) as well.

This update is on the second patch version. I pretend to also add the unit test 
soon.

> [ UI ] Not able to open file from UI if file path contains "%"
> --
>
> Key: HDFS-6662
> URL: https://issues.apache.org/jira/browse/HDFS-6662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1
>Reporter: Brahma Reddy Battula
>Priority: Critical
> Attachments: hdfs-6662.001.patch, hdfs-6662.patch
>
>
> 1. write a file into HDFS is such a way that, file name is like 1%2%3%4
> 2. using NameNode UI browse the file
> throwing following Exception.
> "Path does not exist on HDFS or WebHDFS is disabled. Please check your path 
> or enable WebHDFS"
> HBase write its WAL  files data in HDFS using % contains in file name
> eg: 
> /hbase/WALs/HOST-,60020,1404731504691/HOST-***-130%2C60020%2C1404731504691.1404812663950.meta
>  
> the above file info is not opening in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250799#comment-14250799
 ] 

Hadoop QA commented on HDFS-7530:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687861/HDFS-7530.003.patch
  against trunk revision 9937eef.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9066//console

This message is automatically generated.

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
> HDFS-7530.003.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7530:
---
Attachment: HDFS-7530.003.patch

[~andrew.wang],

Good points. I think that .003 addresses them.

Charles


> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, 
> HDFS-7530.003.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7528:
---
Attachment: (was: HDFS-7530.003.patch)

> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.7.0
>
> Attachments: HDFS-7528.000.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250769#comment-14250769
 ] 

Hudson commented on HDFS-7528:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6738 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6738/])
HDFS-7528. Consolidate symlink-related implementation into a single class. 
Contributed by Haohui Mai. (wheat9: rev 
0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java


> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.7.0
>
> Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250767#comment-14250767
 ] 

Colin Patrick McCabe commented on HDFS-7527:


I am -1 for removing this test right now, until we understand this issue 
better.  Putting "registration names" in the host include and exclude files 
used to work.  If it stopped working, then that's a bug that we should fix.  
Or, alternately, we should have a JIRA to remove registration names entirely.  
Last time we proposed that, it got rejected, though.  See HDFS-5237.

One example of where you might want to set registration names is if you're on 
an AWS instance with internal and external IP interfaces.  On each datanode, 
you would set {{dfs.datanode.hostname}} to the internal IP address to ensure 
that traffic flowed over the internal interface, rather than the (expensive) 
external interfaces.  In this case, you should be able to specify what nodes 
are in the cluster using these same registration names, even if doing reverse 
DNS on the datanode hostnames returns another IP address as the first entry.

> TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
> ---
>
> Key: HDFS-7527
> URL: https://issues.apache.org/jira/browse/HDFS-7527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: Yongjun Zhang
>Assignee: Binglin Chang
> Attachments: HDFS-7527.001.patch
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
> {quote}
> Error Message
> test timed out after 36 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 36 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
> (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
> the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
> datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
> infoSecurePort=0, ipcPort=43726, 
> storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
> 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
> (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565. Exiting. 
> java.io.IOException: DN shut down before block pool connected
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
>   at java.lang.Thread.run(Thread.java:745)
> {quote}
> Found by tool proposed in HADOOP-11045:
> {quote}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> Hadoop-Hdfs-trunk -n 5 | tee bt.log
> Recently FAILED builds in url: 
> https://builds.apache.org//job/Hadoop-Hdfs-trunk
> THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
> as listed below:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
> (2014-12-15 03:30:01)
> Failed test: 
> org.a

[jira] [Updated] (HDFS-7539) Namenode can't leave safemode because of Datanodes' IPC socket timeout

2014-12-17 Thread hoelog (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hoelog updated HDFS-7539:
-
Description: 
During the starting of namenode, data nodes seem waiting namenode's response 
through IPC to register block pools.

here is DN's log -
{code} 
2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Acknowledging ACTIVE Namenode Block pool 
BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid 
2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 
{code}
But namenode is too busy to responde it, and datanodes occur socket timeout - 
default is 1 minute.
{code}
2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to NN.x.com:9000 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.x.x.84:57924 
remote=NN.x.com/10.x.x.143:9000]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout 
{code}
same events repeat and eventually NN drops most connecting trials from DN. So 
NN can't leave safemode.

DN's log -
{code}
2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.io.IOException: failed on local exception java.io.ioexception connection 
reset by peer
{code}
There is no troubles in the network, configuration or servers. I think NN is 
too busy to respond to DN in a minute. 

I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that was 
helpful for my cluster. 
{code}

  ipc.ping.interval
  90

{code}
In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request.
It will be helpful if there is more elegant solution.
{code}
2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Acknowledging ACTIVE Namenode Block pool BP-877672386-10.x.x.143-1412666752827 
(Datanode Uuid c4f7beea-b8e9-404f-bc81-6e87e37263d2) service to 
NN/10.x.x.143:9000
2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Sent 1 blockreports 2090961 blocks total. Took 1690 msec to generate and 193738 
msecs for RPC and NN processing.  Got back commands 
org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@20e68e11
2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Got finalize command for block pool BP-877672386-10.x.x.143-1412666752827
2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: Computing capacity 
for map BlockMap
2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: VM type   = 64-bit
2014-12-16 23:31:32,044 INFO org.apache.hadoop.util.GSet: 0.5% max memory 3.6 
GB = 18.2 MB
2014-12-16 23:31:32,045 INFO org.apache.hadoop.util.GSet: capacity  = 2^21 
= 2097152 entries
2014-12-16 23:31:32,046 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block 
Verification Scanner initialized with interval 504 hours for block pool 
BP-877672386-10.114.130.143-1412666752827
{code}

  was:
During the starting of namenode, data nodes seem waiting namenode's response 
through IPC to register block pools.

here is DN's log -
 
2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Acknowledging ACTIVE Namenode Block pool 
BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid 
2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 

But namenode is too busy to responde it, and datanodes occur socket timeout - 
default is 1 minute.

2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to NN.x.com:9000 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.x.x.84:57924 
remote=NN.x.com/10.x.x.143:9000]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout 

same events repeat and eventually NN drops most connecting trials from DN. So 
NN can't leave safemode.

DN's log -

2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.io.IOException: failed on local exception java.io.ioexception connection 
reset by peer

There is no troubles in the network, configuration or servers. I think NN is 
too busy to respond to DN in a minute. 

I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that was 
helpful for my cluster. 


  ipc.ping.interval
  90


In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request.
It will be helpful if there is more elegant solution.

2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 

[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7528:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks Brandon for the review.

> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.7.0
>
> Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-17 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250732#comment-14250732
 ] 

Haohui Mai commented on HDFS-7529:
--

The v1 patch fixes various formatting issues in the v0 patch.

> Consolidate encryption zone related implementation into a single class
> --
>
> Key: HDFS-7529
> URL: https://issues.apache.org/jira/browse/HDFS-7529
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch
>
>
> This jira proposes to consolidate encryption zone related implementation to a 
> single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250726#comment-14250726
 ] 

Brandon Li commented on HDFS-7528:
--

+1 to Haohui's patch.

> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7521) Refactor DN state management

2014-12-17 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250719#comment-14250719
 ] 

Zhe Zhang commented on HDFS-7521:
-

[~mingma] OK, I missed the other arrow going to up.. Is this the only case 
where the 2 state machines are not independent? If so, how does this corner 
case affect potential formal verification?

> Refactor DN state management
> 
>
> Key: HDFS-7521
> URL: https://issues.apache.org/jira/browse/HDFS-7521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: DNStateMachines.png, HDFS-7521.patch
>
>
> There are two aspects w.r.t. DN state management in NN.
> * State machine management within active NN
> NN maintains states of each data node regarding whether it is running or 
> being decommissioned. But the state machine isn’t well defined. We have dealt 
> with some corner case bug in this area. It will be useful if we can refactor 
> the code to use clear state machine definition that define events, available 
> states and actions for state transitions. It has these benefits.
> ** Make it easy to define correctness of DN state management. Currently some 
> of the state transitions aren't defined in the code. For example, if admins 
> remove a node from include host file while the node is being decommissioned, 
> it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the 
> intention. If we have state machine definition, we can identify this case.
> ** Make it easy to add new state for DN later. For example, people discussed 
> about new “maintenance” state for DN to support the scenario where admins 
> need to take the machine/rack down for 30 minutes for repair.
> We can refactor DN with clear state machine definition based on YARN state 
> related components.
> * State machine consistency between active and standby NN
> Another dimension of state machine management is consistency across NN pairs. 
> We have dealt with bugs due to different live nodes between active NN and 
> standby NN. Current design is to have each NN manage its own state based on 
> the events it receives. For example, DNs will send heartbeat to both NNs; 
> admins will issue decommission commands to both NNs. Alternative design 
> approach could be to have ZK manage the state.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7521) Refactor DN state management

2014-12-17 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250713#comment-14250713
 ] 

Ming Ma commented on HDFS-7521:
---

Folks, thanks for the comments.

[~wheat9], I agree with you that simpler solution is better. This state machine 
lib has been used in YARN and MR and proves to be quite useful for debugging 
and especially when new state needs to be added. When we fixed corner cases in 
DN state  management, we actually wanted to investigate ways to do formal 
checking on NN, but there is no good way to do that without state machine, as 
you mentioned. Definitely want to hear what others might want to say about the 
need of state machine lib.

[~zhz], the main reason to have two states is the reduce the overall possible 
states. For most part, liveness and admin are independent. For the case you 
mentioned, it is specified in the diagram, In_Service can be transitioned to 
either Decommission_In_Progress or Decommissioned state upon receiving 
DECOMISSION_REQUESTED event. Yeah, you can't tell from the diagram how the 
decision is based; only source code has the answer.


> Refactor DN state management
> 
>
> Key: HDFS-7521
> URL: https://issues.apache.org/jira/browse/HDFS-7521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: DNStateMachines.png, HDFS-7521.patch
>
>
> There are two aspects w.r.t. DN state management in NN.
> * State machine management within active NN
> NN maintains states of each data node regarding whether it is running or 
> being decommissioned. But the state machine isn’t well defined. We have dealt 
> with some corner case bug in this area. It will be useful if we can refactor 
> the code to use clear state machine definition that define events, available 
> states and actions for state transitions. It has these benefits.
> ** Make it easy to define correctness of DN state management. Currently some 
> of the state transitions aren't defined in the code. For example, if admins 
> remove a node from include host file while the node is being decommissioned, 
> it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the 
> intention. If we have state machine definition, we can identify this case.
> ** Make it easy to add new state for DN later. For example, people discussed 
> about new “maintenance” state for DN to support the scenario where admins 
> need to take the machine/rack down for 30 minutes for repair.
> We can refactor DN with clear state machine definition based on YARN state 
> related components.
> * State machine consistency between active and standby NN
> Another dimension of state machine management is consistency across NN pairs. 
> We have dealt with bugs due to different live nodes between active NN and 
> standby NN. Current design is to have each NN manage its own state based on 
> the events it receives. For example, DNs will send heartbeat to both NNs; 
> admins will issue decommission commands to both NNs. Alternative design 
> approach could be to have ZK manage the state.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-17 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250698#comment-14250698
 ] 

Haohui Mai commented on HDFS-7529:
--

bq. Wouldn't it be better to fail fast in this case? Did you copy the wrong 
code to #ensureKeysAreInitialized?
Likewise, I think that the checks for nullness of provider, keyName, and 
metadata can be removed from #createEncryptionZoneInt, right?

Duplicating the checks is intentional to define well-formed steps as it is 
implied by the name {{ensureKeysAreInitialized()}}.

bq. are now inside the FSN#writeLock(). I suppose that's not the end of the 
world, but every little bit of extra code inside the writeLock() hurts. 

The performance benefit is minimal as {{getPermissionChecker()}} eventually 
synchronized in the {{UserGroupInformation#getCurrentUser()}}. Making it 
consistent with other operations allows the further refactoring.

> Consolidate encryption zone related implementation into a single class
> --
>
> Key: HDFS-7529
> URL: https://issues.apache.org/jira/browse/HDFS-7529
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch
>
>
> This jira proposes to consolidate encryption zone related implementation to a 
> single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7529:
-
Attachment: HDFS-7529.001.patch

> Consolidate encryption zone related implementation into a single class
> --
>
> Key: HDFS-7529
> URL: https://issues.apache.org/jira/browse/HDFS-7529
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch
>
>
> This jira proposes to consolidate encryption zone related implementation to a 
> single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250667#comment-14250667
 ] 

Andrew Wang commented on HDFS-7528:
---

Charles, I think this was posted on the wrong JIRA :)

> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250668#comment-14250668
 ] 

Brandon Li commented on HDFS-7528:
--

[~clamb], I guess you posted patch on a wrong JIRA. :-)

> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7521) Refactor DN state management

2014-12-17 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250661#comment-14250661
 ] 

Zhe Zhang commented on HDFS-7521:
-

Explicitly defining DN states sounds a great idea to me. It'd be very useful in 
supporting the increasingly complex management tasks.

I'm not entirely sure if _liveness_ and _admin_ should be 2 independent state 
machines. For example, in the current transition diagram, upon receiving 
{{DECOMMISSION_REQUESTED}}, {{In_Service}} always transitions to 
{{Decommission_In_Progress}} (let me know if I'm understanding it wrong). I 
think it should rather depend on whether the DN is {{Running}} or {Dead}}. 

> Refactor DN state management
> 
>
> Key: HDFS-7521
> URL: https://issues.apache.org/jira/browse/HDFS-7521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: DNStateMachines.png, HDFS-7521.patch
>
>
> There are two aspects w.r.t. DN state management in NN.
> * State machine management within active NN
> NN maintains states of each data node regarding whether it is running or 
> being decommissioned. But the state machine isn’t well defined. We have dealt 
> with some corner case bug in this area. It will be useful if we can refactor 
> the code to use clear state machine definition that define events, available 
> states and actions for state transitions. It has these benefits.
> ** Make it easy to define correctness of DN state management. Currently some 
> of the state transitions aren't defined in the code. For example, if admins 
> remove a node from include host file while the node is being decommissioned, 
> it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the 
> intention. If we have state machine definition, we can identify this case.
> ** Make it easy to add new state for DN later. For example, people discussed 
> about new “maintenance” state for DN to support the scenario where admins 
> need to take the machine/rack down for 30 minutes for repair.
> We can refactor DN with clear state machine definition based on YARN state 
> related components.
> * State machine consistency between active and standby NN
> Another dimension of state machine management is consistency across NN pairs. 
> We have dealt with bugs due to different live nodes between active NN and 
> standby NN. Current design is to have each NN manage its own state based on 
> the events it receives. For example, DNs will send heartbeat to both NNs; 
> admins will issue decommission commands to both NNs. Alternative design 
> approach could be to have ZK manage the state.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-17 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7528:
---
Attachment: HDFS-7530.003.patch

[~andrew.wang],

Good points. New diffs address them.


> Consolidate symlink-related implementation into a single class
> --
>
> Key: HDFS-7528
> URL: https://issues.apache.org/jira/browse/HDFS-7528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch
>
>
> The jira proposes to consolidate symlink-related implementation into a single 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-17 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-7542:
--
Status: Patch Available  (was: Open)

> Add an option to DFSAdmin -safemode wait to ignore connection failures
> --
>
> Key: HDFS-7542
> URL: https://issues.apache.org/jira/browse/HDFS-7542
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Attachments: HDFS-7542.001.patch
>
>
> Currently, the _dfsadmin -safemode wait_ command aborts when connection to 
> the NN fails (network glitch, ConnectException when NN is unreachable, 
> EOFException if network link shut down). 
> In certain situations, users have asked for an option to make the command 
> resilient to connection failures. This is useful so that the admin can 
> initiate the wait command despite the NN not being fully up or survive 
> intermittent network issues. With this option, the admin can rely on the wait 
> command continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-17 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-7542:
--
Attachment: HDFS-7542.001.patch

> Add an option to DFSAdmin -safemode wait to ignore connection failures
> --
>
> Key: HDFS-7542
> URL: https://issues.apache.org/jira/browse/HDFS-7542
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Minor
> Attachments: HDFS-7542.001.patch
>
>
> Currently, the _dfsadmin -safemode wait_ command aborts when connection to 
> the NN fails (network glitch, ConnectException when NN is unreachable, 
> EOFException if network link shut down). 
> In certain situations, users have asked for an option to make the command 
> resilient to connection failures. This is useful so that the admin can 
> initiate the wait command despite the NN not being fully up or survive 
> intermittent network issues. With this option, the admin can rely on the wait 
> command continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures

2014-12-17 Thread Stephen Chu (JIRA)
Stephen Chu created HDFS-7542:
-

 Summary: Add an option to DFSAdmin -safemode wait to ignore 
connection failures
 Key: HDFS-7542
 URL: https://issues.apache.org/jira/browse/HDFS-7542
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.6.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor


Currently, the _dfsadmin -safemode wait_ command aborts when connection to the 
NN fails (network glitch, ConnectException when NN is unreachable, EOFException 
if network link shut down). 

In certain situations, users have asked for an option to make the command 
resilient to connection failures. This is useful so that the admin can initiate 
the wait command despite the NN not being fully up or survive intermittent 
network issues. With this option, the admin can rely on the wait command 
continuing to poll instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250593#comment-14250593
 ] 

Andrew Wang commented on HDFS-7530:
---

Hey Charles, note that Colin actually fixed binary diff application in 
HADOOP-10926. You just need to generate the diff with git and without 
--no-prefix. Doesn't matter here though since xml is text.

For the CLI test, can we add an "ls" at the end so we can check the rename? An 
empty substring comparator is never going to trigger. I think that the existing 
test is supposed to test renaming an EZ file to a non-EZ, could we add that too?

Thanks!

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7521) Refactor DN state management

2014-12-17 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250563#comment-14250563
 ] 

Haohui Mai commented on HDFS-7521:
--

bq. Regarding the state machine abstraction, the patch has DN's state modified 
asynchronously by the dispatcher thread instead of synchronously by the caller 
thread. The motivation is to have one thread modify DN's state and might help 
to simplify the lock management in NN. But not sure if that is really 
worthwhile. Changing from async to sync is pretty straightforward.

This part of code has been quite complex. I'm concerned about the additional 
complexity. For example, what are the principles to ensure that the system 
should be properly synchronized? I suggest starting from simple implementation 
and to stabilize it, and moving towards a sophisticated solution if required. 

bq. IMO, reusing existing state machine lib is beneficial. It declares how 
states transition and any actions required. If you look at the state machine 
lib's internal implementation, it is similar to we would have implemented. ... 
Another nice thing of reusing existing state machine lib is you can generate 
the state machine diagram easily.

I'm yet to be convinced that a dedicated library is required for this jira. For 
this use case, a well-formed state machine is so simple that there should no 
need for a library. A dedicated state machine library is hugely beneficial if 
you want to (1) write declarative programs,  or (2) run some formal checking. 
(See P2 and MACE). I think this is out of the scope of this jira.

I see a lot of value on simplifying DN state management using explicit state 
machines. Given the current complexity we have in the code, however, my 
suggestion is that to start simple. It would be great to start refactoring the 
code to make it closer to a state machine (which is anyway required), then we 
can explore additional issues once we get there.

> Refactor DN state management
> 
>
> Key: HDFS-7521
> URL: https://issues.apache.org/jira/browse/HDFS-7521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: DNStateMachines.png, HDFS-7521.patch
>
>
> There are two aspects w.r.t. DN state management in NN.
> * State machine management within active NN
> NN maintains states of each data node regarding whether it is running or 
> being decommissioned. But the state machine isn’t well defined. We have dealt 
> with some corner case bug in this area. It will be useful if we can refactor 
> the code to use clear state machine definition that define events, available 
> states and actions for state transitions. It has these benefits.
> ** Make it easy to define correctness of DN state management. Currently some 
> of the state transitions aren't defined in the code. For example, if admins 
> remove a node from include host file while the node is being decommissioned, 
> it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the 
> intention. If we have state machine definition, we can identify this case.
> ** Make it easy to add new state for DN later. For example, people discussed 
> about new “maintenance” state for DN to support the scenario where admins 
> need to take the machine/rack down for 30 minutes for repair.
> We can refactor DN with clear state machine definition based on YARN state 
> related components.
> * State machine consistency between active and standby NN
> Another dimension of state machine management is consistency across NN pairs. 
> We have dealt with bugs due to different live nodes between active NN and 
> standby NN. Current design is to have each NN manage its own state based on 
> the events it receives. For example, DNs will send heartbeat to both NNs; 
> admins will issue decommission commands to both NNs. Alternative design 
> approach could be to have ZK manage the state.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250562#comment-14250562
 ] 

Hadoop QA commented on HDFS-7540:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687812/HDFS-7540.002.patch
  against trunk revision f2d150e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9062//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9062//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9062//console

This message is automatically generated.

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250523#comment-14250523
 ] 

Charles Lamb commented on HDFS-7530:


The test failures appear to be unrelated.

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-17 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250501#comment-14250501
 ] 

Charles Lamb commented on HDFS-7529:


Hi [~wheat9],

Thanks for looking into this. I have a few comments and then a bunch of 
formatting nits that are introduced as part of this patch.

FSDirEncryptionZoneOp.java:

In #ensureKeysAreInitialized, why do you return if provider, keyName, or 
metadata or null? The existing code would throw an exception, which the new 
code eventually does, but not before it has waited around to take the 
writeLock(). Wouldn't it be better to fail fast in this case? Did you copy the 
wrong code to #ensureKeysAreInitialized?

Likewise, I think that the checks for nullness of provider, keyName, and 
metadata can be removed from #createEncryptionZoneInt, right?

These two lines:

{code}
+final byte[][] pathComponents =
+  FSDirectory.getPathComponentsForReservedPath(src);
+FSPermissionChecker pc = fsn.getPermissionChecker();
{code}

are now inside the FSN#writeLock(). I suppose that's not the end of the world, 
but every little bit of extra code inside the writeLock() hurts. Same issue 
with the call to #logAuditEvent (for the failure case only) being inside the 
writeLock() now. IWBNI that call could be moved out of the scope of the lock.

The same general comment for #getEZForPath. auditStat can be made final in that 
method.

Formatting issues:

You introduced a newline at the end of #createEncryptionZone.

#getFileEncryptionInfo. The formatting for the call to getEZForPath is weird. 
Bump the 'iip);' up a line. In that same method, the call to 
unprotectedGetXAttrByName busts the 80 char limit. I realize that this was 
already in the codebase before this patch, but it was introduced in the 
previous Jira (the one which introduced FSDirXAttrOp) so we might as well fix 
it now for cleanliness purposes.

In #createEncryptionZoneInt, the block comment did not get re-indented -2 when 
you moved it so it's out of alignment now.

FSNamesystem.java:

Call to FSDirEncryptionZoneOp.getFileEncryptionInfo could use some formatting. 
It exceeds the 80 char limit. Ditto the call to 
#generateEncryptedDataEncryptionKey.

FSDirStatAndListingOp.java:

Lines 204 and 423 exceed the 80 char limit.


> Consolidate encryption zone related implementation into a single class
> --
>
> Key: HDFS-7529
> URL: https://issues.apache.org/jira/browse/HDFS-7529
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7529.000.patch
>
>
> This jira proposes to consolidate encryption zone related implementation to a 
> single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250500#comment-14250500
 ] 

Hadoop QA commented on HDFS-7530:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687763/HDFS-7530.002.patch
  against trunk revision e996a1b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
  org.apache.hadoop.hdfs.TestDecommission

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9058//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9058//console

This message is automatically generated.

> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250463#comment-14250463
 ] 

Andrew Wang commented on HDFS-7540:
---

LGTM +1 pending Jenkins, thanks Colin

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7540:
---
Attachment: HDFS-7540.002.patch

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250450#comment-14250450
 ] 

Colin Patrick McCabe commented on HDFS-7540:


bq. I wonder if we should really return a ChunkedArrayList here. It only 
implements a subset of the AbstractList interface, and this is a pretty 
general-purpose method. For huge dirs, we should probably just be using the 
DirectoryStream iterator directly. I do see the use of these helper functions 
for quick-and-dirty listings though.

I think maybe later {{ChunkedArrayList}} will become more general-purpose.  But 
you're right; for now, we better use {{ArrayList}}.

bq. Need  tag for javadoc linebreak

ok

bq. I read the docs at 
http://docs.oracle.com/javase/7/docs/api/java/nio/file/DirectoryStream.html and 
it'd be nice to do like the example and unwrap the DirectoryIteratorException 
into an IOException.

Yeah, that's important... io errors should result in io exceptions.  Looks like 
{{DirectoryIteratorException}} is a {{RuntimeException}}... probably in order 
to conform to the {{Iterator}} interface.

I removed the variant that returns a list of File, since I found that the JDK6 
file listing interfaces actually returned an array of String, so returning a 
list of String is compatible-ish.

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250452#comment-14250452
 ] 

Hadoop QA commented on HDFS-7430:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687802/HDFS-7430.006.patch
  against trunk revision 9b4ba40.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9060//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9060//console

This message is automatically generated.

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> -
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250441#comment-14250441
 ] 

Hadoop QA commented on HDFS-7540:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687794/HDFS-7540.001.patch
  against trunk revision 4281c96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ha.TestZKFailoverController

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9059//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9059//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9059//console

This message is automatically generated.

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-17 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7531:

Attachment: HDFS-7531.002.patch

Thanks for the suggestions [~cmccabe].

I updated the patch based on your comments.

> Improve the concurrent access on FsVolumeList
> -
>
> Key: HDFS-7531
> URL: https://issues.apache.org/jira/browse/HDFS-7531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, 
> HDFS-7531.002.patch
>
>
> {{FsVolumeList}} uses {{synchronized}} to protect the update on 
> {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
> {{getAvailable()}}) iterate {{volumes}} without protection.
> This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
> provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7541) Support for fast HDFS datanode rolling upgrade

2014-12-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7541:
--
Attachment: SupportforfastHDFSdatanoderollingupgrade.pdf

We ([~ctrezzo], [~jmeagher], [~lohit], [~l201514] and [~kihwal] and others) 
discussed ways to address this. Attached is the initial high level design 
document.

* Upgrade domain support. HDFS-3566 outlines the idea, but it isn't applicable 
to hadoop 2 and it uses network topology to store upgrade domain definition. We 
can make load balancer to be more extensible to support different policies.

* Have NN support for new "maintenance" datanode state. Under this state, the 
DN won't process read/write requests; But its replica will remains in BlockMaps 
and thus is still considered valid from block replication point of view.

Appreciate any input.

> Support for fast HDFS datanode rolling upgrade
> --
>
> Key: HDFS-7541
> URL: https://issues.apache.org/jira/browse/HDFS-7541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: SupportforfastHDFSdatanoderollingupgrade.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to 
> minimize the impact on data availability and read/write operations. The side 
> effect is longer upgrade duration for large clusters. This might be 
> acceptable for DN JVM quick restart to update hadoop code/configuration. 
> However, for OS upgrade that requires machine reboot, the overall upgrade 
> duration will be too long if we continue to do sequential DN rolling restart.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7285) Erasure Coding Support inside HDFS

2014-12-17 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7285:

Attachment: HDFSErasureCodingDesign-20141217.pdf

Based on feedbacks from the meetup and a deeper study of file size distribution 
(detailed report to be posted later), *Data Striping* is added to this updated 
design, mainly to support EC on small files. A few highlights compared to the 
first version:
# _Client_: extended with striping and codec logic
# _NameNode_: {{INodeFile}} extended to store both block and {{BlockGroup}} 
information; optimizations are proposed to reduce memory usage caused by 
striping and parity data
# _DataNode_ remains mostly unchanged from the original EC design
# Prioritizing _EC with striping_ as the focus of the initial phase, and 
putting _EC with contiguous (non-striping) layout_ to a 2nd phase

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6403) Add metrics for log warnings reported by JVM pauses

2014-12-17 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6403:

Labels: supportability  (was: )

> Add metrics for log warnings reported by JVM pauses
> ---
>
> Key: HDFS-6403
> URL: https://issues.apache.org/jira/browse/HDFS-6403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>  Labels: supportability
> Fix For: 2.5.0
>
> Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch, 
> HDFS-6403.003.patch, HDFS-6403.004.patch, HDFS-6403.005.patch
>
>
> HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed 
> as a metric, then they can be monitored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6959) Make the HDFS home directory location customizable.

2014-12-17 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6959:

Labels: supportability  (was: )

> Make the HDFS home directory location customizable.
> ---
>
> Key: HDFS-6959
> URL: https://issues.apache.org/jira/browse/HDFS-6959
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Kevin Odell
>Assignee: Yongjun Zhang
>Priority: Minor
>  Labels: supportability
> Fix For: 2.6.0
>
> Attachments: HADOOP-10334.001.patch, HADOOP-10334.002.patch, 
> HADOOP-10334.002.patch, HDFS-6959.001.patch, HDFS-6959.002.patch
>
>
> The path is currently hardcoded:
> public Path getHomeDirectory() {
> return makeQualified(new Path("/user/" + dfs.ugi.getShortUserName()));
>   }
> It would be nice to have that as a customizable value.  
> Thank you



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2014-12-17 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250393#comment-14250393
 ] 

Plamen Jeliazkov commented on HDFS-7056:


I just checked all the failures reported by Jenkins BOT.

# FindBugs is unrelated -- we don't modify getBlockLocations. Issue comes from 
HDFS-7463.
# JavaDoc warnings are from stuff we've never touched.
# Java warnings are from files we've never touched.
# All the "failed / timed out" tests passed on my local machine.
(Except TestOfflineEditsViewer, which passes once you have the correct edits 
files loaded)

> Snapshot support for truncate
> -
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which 
> are in a snapshot. It is desirable to be able to truncate and still keep the 
> old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250388#comment-14250388
 ] 

Andrew Wang commented on HDFS-7540:
---

Thanks for working on this Colin. It'll be nice to swap this in where we can, 
JDK7 does a much better job at exposing filesystem APIs.

I wonder if we should really return a ChunkedArrayList here. It only implements 
a subset of the AbstractList interface, and this is a pretty general-purpose 
method. For huge dirs, we should probably just be using the DirectoryStream 
iterator directly. I do see the use of these helper functions for 
quick-and-dirty listings though.

I'd be okay providing variants of these functions that return a 
ChunkedArrayList, but it seems like the default should just be a normal 
ArrayList.

Couple other things:

* Need {{}} tag for javadoc linebreak
* I read the docs at 
http://docs.oracle.com/javase/7/docs/api/java/nio/file/DirectoryStream.html and 
it'd be nice to do like the example and unwrap the DirectoryIteratorException 
into an IOException.

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7430:
---
Attachment: HDFS-7430.006.patch

updated

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> -
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7541) Support for fast HDFS datanode rolling upgrade

2014-12-17 Thread Ming Ma (JIRA)
Ming Ma created HDFS-7541:
-

 Summary: Support for fast HDFS datanode rolling upgrade
 Key: HDFS-7541
 URL: https://issues.apache.org/jira/browse/HDFS-7541
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma


Current HDFS DN rolling upgrade step requires sequential DN restart to minimize 
the impact on data availability and read/write operations. The side effect is 
longer upgrade duration for large clusters. This might be acceptable for DN JVM 
quick restart to update hadoop code/configuration. However, for OS upgrade that 
requires machine reboot, the overall upgrade duration will be too long if we 
continue to do sequential DN rolling restart.

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7540:
---
Attachment: HDFS-7540.001.patch

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7540:
---
Status: Patch Available  (was: Open)

> Add IOUtils#listDirectory
> -
>
> Key: HDFS-7540
> URL: https://issues.apache.org/jira/browse/HDFS-7540
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7540.001.patch
>
>
> We should have a drop-in replacement for File#listDir that doesn't hide 
> IOExceptions, and which returns a ChunkedArrayList rather than a single large 
> array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7540) Add IOUtils#listDirectory

2014-12-17 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7540:
--

 Summary: Add IOUtils#listDirectory
 Key: HDFS-7540
 URL: https://issues.apache.org/jira/browse/HDFS-7540
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


We should have a drop-in replacement for File#listDir that doesn't hide 
IOExceptions, and which returns a ChunkedArrayList rather than a single large 
array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250291#comment-14250291
 ] 

Colin Patrick McCabe commented on HDFS-7531:


Oops, just found two minor comment issues when I was going to commit this.

{code}
  /**
   * Read access to this atomic reference array is not synchronized.
   * This list is replaced on modification holding "this" lock.
   */
{code}
Can you remove this comment?  It is no longer accurate because we don't hold 
the "this" lock when replacing the atomic reference.  I don't think we need the 
first part, either... it's assumed that objects in {{AtomicReference}} are 
accessed locklessly unless otherwise noted.

{code}
  /**
   * Returns a unmodifiable list view of all the volumes.
   * Note that this list is unmodifiable.
   */
{code}
This comment is a bit redundant.  If it's an "unmodifiable list" then we don't 
need to also note that it is unmodifiable.  Let's get rid of the second line, 
and change unmodifiable to "immutable" since that's more idiomatic

thanks

> Improve the concurrent access on FsVolumeList
> -
>
> Key: HDFS-7531
> URL: https://issues.apache.org/jira/browse/HDFS-7531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch
>
>
> {{FsVolumeList}} uses {{synchronized}} to protect the update on 
> {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
> {{getAvailable()}}) iterate {{volumes}} without protection.
> This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
> provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever

2014-12-17 Thread Frantisek Vacek (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250253#comment-14250253
 ] 

Frantisek Vacek commented on HDFS-7392:
---

I'm too bussy to implement promissed patch so I'm adding part of log to show 
what is wrong with connection timeout. Please let me know if it helped.

Fanda

Everlasting attempt to open nonexisting hdfs uri 
hdfs://share.merck.com/OneLevelHeader.xlsx

opening path: /OneLevelHeader.xlsx ...
DEBUG [main] (Client.java:426) - The ping interval is 6 ms.
DEBUG [main] (Client.java:695) - Connecting to share.merck.com/54.40.29.223:8020
 INFO [main] (Client.java:814) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 0 time(s); maxRetries=45
 WARN [main] (Client.java:568) - Address change detected. Old: 
share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020
 INFO [main] (Client.java:814) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 0 time(s); maxRetries=45
 INFO [main] (Client.java:814) - Retrying connect to server: 
share.merck.com/54.40.29.65:8020. Already tried 1 time(s); maxRetries=45
 WARN [main] (Client.java:568) - Address change detected. Old: 
share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020
 INFO [main] (Client.java:814) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 0 time(s); maxRetries=45
 INFO [main] (Client.java:814) - Retrying connect to server: 
share.merck.com/54.40.29.223:8020. Already tried 1 time(s); maxRetries=45

> org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
> -
>
> Key: HDFS-7392
> URL: https://issues.apache.org/jira/browse/HDFS-7392
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Frantisek Vacek
>Assignee: Yi Liu
> Attachments: 1.png, 2.png
>
>
> In some specific circumstances, 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts 
> and last forever.
> What are specific circumstances:
> 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point 
> to valid IP address but without name node service running on it.
> 2) There should be at least 2 IP addresses for such a URI. See output below:
> {quote}
> [~/proj/quickbox]$ nslookup share.example.com
> Server: 127.0.1.1
> Address:127.0.1.1#53
> share.example.com canonical name = 
> internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
> Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
> Address: 192.168.1.223
> Name:   internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
> Address: 192.168.1.65
> {quote}
> In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() 
> returns sometimes true (even if address didn't actually changed see img. 1) 
> and the timeoutFailures counter is set to 0 (see img. 2). The 
> maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is 
> repeated forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7521) Refactor DN state management

2014-12-17 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250228#comment-14250228
 ] 

Yongjun Zhang commented on HDFS-7521:
-

HI [~mingma],

Thanks for the info. I checked with [~andrew.wang], he stated that it's 
intentional to leave a DN to be in  state, so when the 
DN is revived, decommissioning can continue, which makes sense to me.  Thanks 
Andrew.



> Refactor DN state management
> 
>
> Key: HDFS-7521
> URL: https://issues.apache.org/jira/browse/HDFS-7521
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: DNStateMachines.png, HDFS-7521.patch
>
>
> There are two aspects w.r.t. DN state management in NN.
> * State machine management within active NN
> NN maintains states of each data node regarding whether it is running or 
> being decommissioned. But the state machine isn’t well defined. We have dealt 
> with some corner case bug in this area. It will be useful if we can refactor 
> the code to use clear state machine definition that define events, available 
> states and actions for state transitions. It has these benefits.
> ** Make it easy to define correctness of DN state management. Currently some 
> of the state transitions aren't defined in the code. For example, if admins 
> remove a node from include host file while the node is being decommissioned, 
> it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the 
> intention. If we have state machine definition, we can identify this case.
> ** Make it easy to add new state for DN later. For example, people discussed 
> about new “maintenance” state for DN to support the scenario where admins 
> need to take the machine/rack down for 30 minutes for repair.
> We can refactor DN with clear state machine definition based on YARN state 
> related components.
> * State machine consistency between active and standby NN
> Another dimension of state machine management is consistency across NN pairs. 
> We have dealt with bugs due to different live nodes between active NN and 
> standby NN. Current design is to have each NN manage its own state based on 
> the events it receives. For example, DNs will send heartbeat to both NNs; 
> admins will issue decommission commands to both NNs. Alternative design 
> approach could be to have ZK manage the state.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7373) Clean up temporary files after fsimage transfer failures

2014-12-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250144#comment-14250144
 ] 

Kihwal Lee commented on HDFS-7373:
--

[~ajisakaa] Can I get an official binding +1?  :)

> Clean up temporary files after fsimage transfer failures
> 
>
> Key: HDFS-7373
> URL: https://issues.apache.org/jira/browse/HDFS-7373
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-7373.patch
>
>
> When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in 
> each storage directory.  If the size of name space is large, these files can 
> take up quite a bit of space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-17 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7530:
---
Attachment: HDFS-7530.002.patch

[~andrew.wang],

Thank you for the review. The .002 patch fixes the failing test. It turns out 
that that CLI test was testing exactly what this patch is for so I just 
modified the description and the expected output to reflect success rather than 
failure. I also added the message to the assertTrue. As you know, Jenkins will 
continue to fail the TestCryptoAdminCLI since test-patch.sh will not apply the 
testCryptoConf.xml file.


> Allow renaming an Encryption Zone root
> --
>
> Key: HDFS-7530
> URL: https://issues.apache.org/jira/browse/HDFS-7530
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch
>
>
> It should be possible to do
> hdfs dfs -mv /ezroot /newnameforezroot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250007#comment-14250007
 ] 

Hudson commented on HDFS-7536:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/])
HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed 
by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c)
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
> --
>
> Key: HDFS-7536
> URL: https://issues.apache.org/jira/browse/HDFS-7536
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HADOOP-11413.001.patch
>
>
> in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can 
> remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250005#comment-14250005
 ] 

Hudson commented on HDFS-7494:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/])
HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by 
synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev 
a97a1e73177974cff8afafad6ca43a96563f3c61)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Checking of closed in DFSInputStream#pread() should be protected by 
> synchronization
> ---
>
> Key: HDFS-7494
> URL: https://issues.apache.org/jira/browse/HDFS-7494
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch
>
>
> {code}
>   private int pread(long position, byte[] buffer, int offset, int length)
>   throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
> {code}
> Checking of closed should be protected by holding lock on 
> "DFSInputStream.this"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250006#comment-14250006
 ] 

Hudson commented on HDFS-6425:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249951#comment-14249951
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249950#comment-14249950
 ] 

Hudson commented on HDFS-7494:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/])
HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by 
synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev 
a97a1e73177974cff8afafad6ca43a96563f3c61)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Checking of closed in DFSInputStream#pread() should be protected by 
> synchronization
> ---
>
> Key: HDFS-7494
> URL: https://issues.apache.org/jira/browse/HDFS-7494
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch
>
>
> {code}
>   private int pread(long position, byte[] buffer, int offset, int length)
>   throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
> {code}
> Checking of closed should be protected by holding lock on 
> "DFSInputStream.this"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249952#comment-14249952
 ] 

Hudson commented on HDFS-7536:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/])
HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed 
by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c)
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
> --
>
> Key: HDFS-7536
> URL: https://issues.apache.org/jira/browse/HDFS-7536
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HADOOP-11413.001.patch
>
>
> in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can 
> remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249908#comment-14249908
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249909#comment-14249909
 ] 

Hudson commented on HDFS-7536:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/])
HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed 
by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java


> Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
> --
>
> Key: HDFS-7536
> URL: https://issues.apache.org/jira/browse/HDFS-7536
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HADOOP-11413.001.patch
>
>
> in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can 
> remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249907#comment-14249907
 ] 

Hudson commented on HDFS-7494:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/])
HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by 
synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev 
a97a1e73177974cff8afafad6ca43a96563f3c61)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> Checking of closed in DFSInputStream#pread() should be protected by 
> synchronization
> ---
>
> Key: HDFS-7494
> URL: https://issues.apache.org/jira/browse/HDFS-7494
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch
>
>
> {code}
>   private int pread(long position, byte[] buffer, int offset, int length)
>   throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
> {code}
> Checking of closed should be protected by holding lock on 
> "DFSInputStream.this"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249899#comment-14249899
 ] 

Hudson commented on HDFS-7536:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/])
HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed 
by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c)
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
> --
>
> Key: HDFS-7536
> URL: https://issues.apache.org/jira/browse/HDFS-7536
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HADOOP-11413.001.patch
>
>
> in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can 
> remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249897#comment-14249897
 ] 

Hudson commented on HDFS-7494:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/])
HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by 
synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev 
a97a1e73177974cff8afafad6ca43a96563f3c61)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> Checking of closed in DFSInputStream#pread() should be protected by 
> synchronization
> ---
>
> Key: HDFS-7494
> URL: https://issues.apache.org/jira/browse/HDFS-7494
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch
>
>
> {code}
>   private int pread(long position, byte[] buffer, int offset, int length)
>   throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
> {code}
> Checking of closed should be protected by holding lock on 
> "DFSInputStream.this"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249898#comment-14249898
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs

2014-12-17 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-7501:
--
Status: In Progress  (was: Patch Available)

Resetting state awaiting improved test.

> TransactionsSinceLastCheckpoint can be negative on SBNs
> ---
>
> Key: HDFS-7501
> URL: https://issues.apache.org/jira/browse/HDFS-7501
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Harsh J
>Assignee: Gautam Gopalakrishnan
>Priority: Trivial
> Attachments: HDFS-7501.patch
>
>
> The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus 
> NNStorage.mostRecentCheckpointTxId.
> In Standby mode, the former does not increment beyond the loaded or 
> last-when-active value, but the latter does change due to checkpoints done 
> regularly in this mode. Thereby, the SBN will eventually end up showing 
> negative values for TransactionsSinceLastCheckpoint.
> This is not an issue as the metric only makes sense to be monitored on the 
> Active NameNode, but we should perhaps just show the value 0 by detecting if 
> the NN is in SBN form, as allowing a negative number is confusing to view 
> within a chart that tracks it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >