[jira] [Updated] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk

2014-10-06 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7112:
-
Attachment: HDFS-7112.2.patch

Can't repro the build failure from Jenkins. Update the patch. 

> LazyWriter should use either async IO or one thread per physical disk
> -
>
> Key: HDFS-7112
> URL: https://issues.apache.org/jira/browse/HDFS-7112
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 2.6.0
>
> Attachments: HDFS-7112.0.patch, HDFS-7112.1.patch, HDFS-7112.2.patch
>
>
> The LazyWriter currently uses synchronous IO and a single thread. This limits 
> the throughput to that of a single disk. Using either async overlapped IO or 
> one thread per physical disk will improve the write throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161454#comment-14161454
 ] 

Haohui Mai commented on HDFS-6994:
--

bq. I'm going to get rid of the code in hadoop-native-core (I guess I should 
put up a change for that) since this directory isn't needed any longer.

I should have clarify before. What I meant "merge" means "merging to trunk" :-)

Are you suggesting to deprecate hadoop-native-core before it get merged? If so, 
given the fact that there are way less resolved tasks in HDFS-6994 compared to 
HADOOP-10388 (5 vs 22), maybe it is more appropriate to start a new branch with 
the five resolved tasks we have so far?

Having a clean branch allows us to preserve the development history more easily 
when the branch is merged into trunk, thanks to the recent switch to git.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7014) Implement input and output streams to DataNode for native client

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161434#comment-14161434
 ] 

Haohui Mai commented on HDFS-7014:
--

I appreciate if the patch can be further split into smaller pieces. Is it 
possible take out FileSystem / BlockReader / LeaseRenewer into separate patches?

> Implement input and output streams to DataNode for native client
> 
>
> Key: HDFS-7014
> URL: https://issues.apache.org/jira/browse/HDFS-7014
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch
>
>
> Implement Client - Namenode RPC protocol and support Namenode HA.
> Implement Client - Datanode RPC protocol.
> Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161431#comment-14161431
 ] 

Haohui Mai commented on HDFS-7193:
--

+1

> value of "dfs.webhdfs.enabled" in user doc is incorrect.
> 
>
> Key: HDFS-7193
> URL: https://issues.apache.org/jira/browse/HDFS-7193
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, webhdfs
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Trivial
> Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch, 
> HDFS-7193.003.patch
>
>
> The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not 
> _http/_HOST@REALM.TLD_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161343#comment-14161343
 ] 

Hadoop QA commented on HDFS-7198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673194/HDFS-7198.001.patch
  against trunk revision 8dc6abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8336//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8336//console

This message is automatically generated.

> Fix or suppress findbugs "unchecked conversion" warning in 
> DFSClient#getPathTraceScope
> --
>
> Key: HDFS-7198
> URL: https://issues.apache.org/jira/browse/HDFS-7198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Trivial
> Attachments: HDFS-7198.001.patch
>
>
> Fix or suppress the findbugs "unchecked conversion" warning in 
> {{DFSClient#getPathTraceScope}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161322#comment-14161322
 ] 

Colin Patrick McCabe commented on HDFS-6994:


bq. Agree. However, I think when the time of call for merge comes, it requires 
the reviewers to look at both sides of the code. Separating it into another 
branch would make things much easier and allow ensuring better code quality.

I think there is some misunderstanding here.  There isn't going to be a single 
"time of merge."  Zhanwei, Abe and I are fixing up libhdfs3 as we go to have 
the functionality it needs.  In other words merging the functionality now, not 
later.  I'm going to get rid of the code in hadoop-native-core (I guess I 
should put up a change for that) since this directory isn't needed any longer.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7014) Implement input and output streams to DataNode for native client

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161316#comment-14161316
 ] 

Colin Patrick McCabe commented on HDFS-7014:


I kept around the base class for NamenodeProxy.  I can see now why it's needed 
in the failover logic.  I think that the InputStreamInter, etc. things are not 
needed, though, so I removed those.  Fixed a bunch more super-long lines.  Give 
this a review when you can!  It would be nice to get all this code in so we can 
start tackling things like HDFS-7023.

> Implement input and output streams to DataNode for native client
> 
>
> Key: HDFS-7014
> URL: https://issues.apache.org/jira/browse/HDFS-7014
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch
>
>
> Implement Client - Namenode RPC protocol and support Namenode HA.
> Implement Client - Datanode RPC protocol.
> Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-7014) Implement input and output streams to DataNode for native client

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7014 started by Colin Patrick McCabe.
--
> Implement input and output streams to DataNode for native client
> 
>
> Key: HDFS-7014
> URL: https://issues.apache.org/jira/browse/HDFS-7014
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch
>
>
> Implement Client - Namenode RPC protocol and support Namenode HA.
> Implement Client - Datanode RPC protocol.
> Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7014) Implement input and output streams to DataNode for native client

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7014:
---
Attachment: 0001-HDFS-7014-001.patch

> Implement input and output streams to DataNode for native client
> 
>
> Key: HDFS-7014
> URL: https://issues.apache.org/jira/browse/HDFS-7014
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch
>
>
> Implement Client - Namenode RPC protocol and support Namenode HA.
> Implement Client - Datanode RPC protocol.
> Implement some basic server side class such as ExtendedBlock and LocatedBlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7174) Support for more efficient large directories

2014-10-06 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161304#comment-14161304
 ] 

Yi Liu commented on HDFS-7174:
--

{quote}
Konstantin Shvachko wrote: I am probably late to the party, but for whatever it 
worth. Did you consider using balanced trees for inode lists, something like 
B-trees?
B-trees would be an excellent solution here. Since they generally use arrays in 
the leaf nodes, this also gets you the benefits of tighter packing in memory. I 
guess the tricky part is writing the code.
{quote}
Good point, agree. We should be careful about the memory usage during 
implementation.

> Support for more efficient large directories
> 
>
> Key: HDFS-7174
> URL: https://issues.apache.org/jira/browse/HDFS-7174
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion 
> becomes very costly.  E.g. creating 1M entries takes 10s of minutes.  This is 
> because the complexity of an insertion is O\(n\). As the size of a list 
> grows, the overhead grows n^2. (integral of linear function).  It also causes 
> allocations and copies of big arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3107) HDFS truncate

2014-10-06 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-3107:
---
Attachment: HDFS-3107.patch

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3107) HDFS truncate

2014-10-06 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-3107:
---
Attachment: editsStored.xml
editsStored

Refreshing patch.

Made a couple changes:
# TruncateOp now acts like an AddOp (it opens the file for write).
# Added updateSpaceConsumed() call in FSDirectory.unprotectedTruncate().
# Replaced asserts in tests with assertThat() calls.
# Also attaching new editsStored and editsStored.xml files.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161298#comment-14161298
 ] 

Hadoop QA commented on HDFS-7193:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673245/HDFS-7193.003.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8339//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8339//console

This message is automatically generated.

> value of "dfs.webhdfs.enabled" in user doc is incorrect.
> 
>
> Key: HDFS-7193
> URL: https://issues.apache.org/jira/browse/HDFS-7193
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, webhdfs
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Trivial
> Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch, 
> HDFS-7193.003.patch
>
>
> The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not 
> _http/_HOST@REALM.TLD_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7185) The active NameNode will not accept an fsimage sent from the standby during rolling upgrade

2014-10-06 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161286#comment-14161286
 ] 

Jing Zhao commented on HDFS-7185:
-

Hi Colin, one question is that is the scenario where we hit this exception is 
only when we have upgraded the SBN with the new version of the software, while 
still leaving the ANN running with the old bits? If this is the case, to have 
this exception should be the correct behavior. This is because if we allow a 
checkpoint to happen at this time, a fsimage written by new bits is uploaded 
into the ANN, which may not be understood by the old software. Then we cannot 
normally restart the original ANN until we also upgrade it to the new version.

> The active NameNode will not accept an fsimage sent from the standby during 
> rolling upgrade
> ---
>
> Key: HDFS-7185
> URL: https://issues.apache.org/jira/browse/HDFS-7185
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>
> The active NameNode will not accept an fsimage sent from the standby during 
> rolling upgrade.  The active fails with the exception:
> {code}
> 18:25:07,620  WARN ImageServlet:198 - Received an invalid request file 
> transfer request from a secondary with storage info 
> -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> 18:25:07,620  WARN log:76 - Committed before 410 PutImage failed. 
> java.io.IOException: This namenode has storage info 
> -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
> expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
> 0a6e431987f6
> at 
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
> at 
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
> {code}
> On the standby, the exception is:
> {code}
> java.io.IOException: Exception during image upload: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  This namenode has storage info 
> -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary 
> expected
>  -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
> {code}
> This seems to be a consequence of the fact that the VERSION file still is at 
> -55 (the old version) even after the rolling upgrade has started.  When the 
> rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}}, 
> both VERSION files get set to the new version, and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161257#comment-14161257
 ] 

Hadoop QA commented on HDFS-7112:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673243/HDFS-7112.1.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8338//console

This message is automatically generated.

> LazyWriter should use either async IO or one thread per physical disk
> -
>
> Key: HDFS-7112
> URL: https://issues.apache.org/jira/browse/HDFS-7112
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 2.6.0
>
> Attachments: HDFS-7112.0.patch, HDFS-7112.1.patch
>
>
> The LazyWriter currently uses synchronous IO and a single thread. This limits 
> the throughput to that of a single disk. Using either async overlapped IO or 
> one thread per physical disk will improve the write throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.

2014-10-06 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7193:
-
Attachment: HDFS-7193.003.patch

Haohui, agree with that we can remove it since it already exists in Web HDFS 
doc, thanks. Just update the patch.

> value of "dfs.webhdfs.enabled" in user doc is incorrect.
> 
>
> Key: HDFS-7193
> URL: https://issues.apache.org/jira/browse/HDFS-7193
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, webhdfs
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Trivial
> Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch, 
> HDFS-7193.003.patch
>
>
> The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not 
> _http/_HOST@REALM.TLD_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk

2014-10-06 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7112:
-
Attachment: HDFS-7112.1.patch

> LazyWriter should use either async IO or one thread per physical disk
> -
>
> Key: HDFS-7112
> URL: https://issues.apache.org/jira/browse/HDFS-7112
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 2.6.0
>
> Attachments: HDFS-7112.0.patch, HDFS-7112.1.patch
>
>
> The LazyWriter currently uses synchronous IO and a single thread. This limits 
> the throughput to that of a single disk. Using either async overlapped IO or 
> one thread per physical disk will improve the write throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-10-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161219#comment-14161219
 ] 

Chris Nauroth commented on HDFS-7128:
-

That's some great analysis, [~mingma].  +1 for the patch.  The findbugs and 
test failures look unrelated.

[~kihwal], I'll hold off committing until tomorrow in case you have further 
feedback.

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7128-2.patch, HDFS-7128.patch
>
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161202#comment-14161202
 ] 

Haohui Mai commented on HDFS-6994:
--

bq. I think "mixing with the earlier effort" is exactly what we need to do 
here, and duplicating effort is exactly what we shouldn't do.

Agree. However, I think when the time of call for merge comes, it requires the 
reviewers to look at both sides of the code. Separating it into another branch 
would make things much easier and allow ensuring better code quality.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk

2014-10-06 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161190#comment-14161190
 ] 

Xiaoyu Yao commented on HDFS-7112:
--

Thanks [~arpitagarwal] for reviewing the patch. 

# FsDatasetImpl.java: We should avoid adding volumes to asyncDiskService unless 
there is a RAM_DISK volume configured. This avoids creating unnecessary thread 
pools on most deployments which will not have RAM_DISK.
Do you mean adding volumes to asyncLazyPersistService? For 
asyncLazyPersistService, we will only call asyncLazyPersistService#addVolume 
when the storage type of the volume is RAM_DISK. 

# We don't need the counter and logging in {{addExecutorForVolume}}. We will 
never add more than one thread per volume.
Fixed.

# {{onStartLazyPersist}} should be called by {{saveNextReplica}} before it 
calls {{submitLazyPersistTask}}.
Fixed.

# {{onFailLazyPersist}} should never be called unless {{onStartLazyPersist}} 
has been called. I think it can be called on some failure paths even if 
submitLazyPersistTask was not called.
onFailLazyPersist should be called upon any failure after the block is dequeued 
with the following call. 
 block = ramDiskReplicaTracker.dequeueNextReplicaToPersist();

The patch calls onFailLazyPersist in two error paths.
1) Failed to submit the request to thread pool when calling 
submitLazyPersistTask
2) Failed during the thread pool thread execution.  

# {{BlockPoolSlice#lazyPersistReplica}} is unused, can be removed.
Removed.

# {{RamDiskAsyncLazyPersistService#countPendingTasks}} is unused.
Removed.

# It would be good if RamDiskAsyncLazyPersistService did not have a dependency 
on DataNode/FsDatasetImpl. It can accept the success and failure callbacks as 
parameters. But it's okay to fix it later in a separate Jira.
That’s a good idea. I will file a separate JIRA for that.




> LazyWriter should use either async IO or one thread per physical disk
> -
>
> Key: HDFS-7112
> URL: https://issues.apache.org/jira/browse/HDFS-7112
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 2.6.0
>
> Attachments: HDFS-7112.0.patch
>
>
> The LazyWriter currently uses synchronous IO and a single thread. This limits 
> the throughput to that of a single disk. Using either async overlapped IO or 
> one thread per physical disk will improve the write throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161156#comment-14161156
 ] 

Colin Patrick McCabe commented on HDFS-7198:


The "new" findbugs warning here is just a repeat of HDFS-7194, which is 
interesting.  Since that warning was just fixed, I re-kicked the build.

> Fix or suppress findbugs "unchecked conversion" warning in 
> DFSClient#getPathTraceScope
> --
>
> Key: HDFS-7198
> URL: https://issues.apache.org/jira/browse/HDFS-7198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Trivial
> Attachments: HDFS-7198.001.patch
>
>
> Fix or suppress the findbugs "unchecked conversion" warning in 
> {{DFSClient#getPathTraceScope}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161145#comment-14161145
 ] 

Hadoop QA commented on HDFS-7189:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673205/HDFS-7189.003.patch
  against trunk revision 8dc6abf.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8337//console

This message is automatically generated.

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161137#comment-14161137
 ] 

Colin Patrick McCabe commented on HDFS-7189:


Build failed due to BUILDS-26, retriggering

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161136#comment-14161136
 ] 

Hudson commented on HDFS-7194:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6201 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6201/])
HDFS-7194 Fix findbugs "inefficient new String constructor" warning in 
DFSClient#PATH (yzhang via cmccabe) (cmccabe: rev 
8dc6abf2f4218b2d84b2c2dc7d18623d897c362d)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java


> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
> warning:
> {code}
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7194:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0
  Status: Resolved  (was: Patch Available)

thanks, Yongjun.

> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
> warning:
> {code}
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161105#comment-14161105
 ] 

Yongjun Zhang commented on HDFS-7146:
-

Hi [~aw],

About the username pattern allowed on different platforms, there were 
discussion in HDFS-4983 and HDFS-4733:
{quote}
Alejandro Abdelnur added a comment - 04/Dec/13 17:01
Allowed usernames are the OS allowed user names. Different versions of 
Unix/Linux have different restrictions by default. This was discussed when this 
was done for httpfs. Refer to HDFS-4733 for details.
{quote}
I agree with you that ideally all allowed usernames would comply with the same 
convention, that would make our life much easier. However, if user already had 
the numerical usernames, we probably have to support. To ask them to change 
user name is going to be much harder than for us to support it:-) That's what 
HDFS-4983 and HDFS-4733 about.

Would you please also address the questions I asked in "Another thought Allen 
Wittenauer," comment above? 

Thanks a lot.


> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7201) Fix typos in hdfs-default.xml

2014-10-06 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HDFS-7201:
-

 Summary: Fix typos in hdfs-default.xml
 Key: HDFS-7201
 URL: https://issues.apache.org/jira/browse/HDFS-7201
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko


Found the following typos in hdfs-default.xml:
repliaction
directoires
teh
tranfer
spage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161043#comment-14161043
 ] 

Hadoop QA commented on HDFS-7194:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673186/HDFS-7194.002.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
12 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/8333//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8333//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8333//console

This message is automatically generated.

> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
> warning:
> {code}
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161042#comment-14161042
 ] 

Hadoop QA commented on HDFS-7198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673194/HDFS-7198.001.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8334//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8334//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8334//console

This message is automatically generated.

> Fix or suppress findbugs "unchecked conversion" warning in 
> DFSClient#getPathTraceScope
> --
>
> Key: HDFS-7198
> URL: https://issues.apache.org/jira/browse/HDFS-7198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Trivial
> Attachments: HDFS-7198.001.patch
>
>
> Fix or suppress the findbugs "unchecked conversion" warning in 
> {{DFSClient#getPathTraceScope}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161039#comment-14161039
 ] 

Hadoop QA commented on HDFS-7189:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673205/HDFS-7189.003.patch
  against trunk revision 8099de2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8335//console

This message is automatically generated.

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7200) Rename libhdfs3 to libndfs++

2014-10-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7200:
--

 Summary: Rename libhdfs3 to libndfs++
 Key: HDFS-7200
 URL: https://issues.apache.org/jira/browse/HDFS-7200
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Since we generally agree that libhdfs3 is a sub-optimal name, let's call the 
new library "libndfs++."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161021#comment-14161021
 ] 

dhruba borthakur commented on HDFS-3107:


Thanks for the clarification Milind. I was just making sure that i understand 
the limitations of such a database the uses the HDFS truncate feature. Given 
this fact, it is unlikely that HBase can use it (in future) to support 
transactions. 

Thanks anyways.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-06 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161011#comment-14161011
 ] 

Subbu commented on HDFS-7175:
-

I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files).

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Carl Steinbach
>Assignee: Akira AJISAKA
> Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message

2014-10-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-3342:

Labels: supportability  (was: )

> SocketTimeoutException in BlockSender.sendChunks could have a better error 
> message
> --
>
> Key: HDFS-3342
> URL: https://issues.apache.org/jira/browse/HDFS-3342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Yongjun Zhang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-3342.001.patch
>
>
> Currently, if a client connects to a DN and begins to read a block, but then 
> stops calling read() for a long period of time, the DN will log a 
> SocketTimeoutException "48 millis timeout while waiting for channel to be 
> ready for write." This is because there is no "keepalive" functionality of 
> any kind. At a minimum, we should improve this error message to be an INFO 
> level log which just says that the client likely stopped reading, so 
> disconnecting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7189:
---
Attachment: HDFS-7189.003.patch

fix typo

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7189:
---
Attachment: (was: HDFS-7189.002.patch)

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161006#comment-14161006
 ] 

Allen Wittenauer commented on HDFS-7146:


bg. See HDFS-4983.

That JIRA is sort of irrelevant to the discussion since HDFS (and therefore 
WebHDFS) has no such restrictions on usernames since they are published as 
strings. Unix does and we have to play by its rules since that's the space this 
code plays.  

bq. Seems the requirement on user name varies.

Not really.  Some useradd's do not enforce the entire rule set, which is why I 
said "most/all".  Some Linux distributions include a useradd facility that do 
not.  If you look at the upstream Linux shadow utilities source, however, 
(https://github.com/shadow-maint/shadow/blob/master/libmisc/chkname.c) you'll 
find that all digit usernames are not legal.  Other OSes follow similar rules 
in their utilities ( e.g., Illumos: 
https://hg.openindiana.org/upstream/illumos/illumos-gate/file/68f95e015346/usr/src/cmd/aset/tasks/pwchk.awk
 ).  Just because some distributions allowed users to do incredibly dumb things 
doesn't mean we need to as well.

FWIW, if you want true portability, you'll need to use the native system calls 
to follow whatever rules are allowed on that machine.  Otherwise, expect to 
make some compatibility decisions.  To me, this is an easy call:  all numeric 
usernames are super rare since they have unpredictable results (e.g., chown).  
portability > naive admins who shot themselves in the foot.

> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7189:
---
Attachment: HDFS-7189.002.patch

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch, HDFS-7189.002.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160998#comment-14160998
 ] 

Colin Patrick McCabe commented on HDFS-7189:


bq. \[is adding unwrapRemoteException to getStoragePolicies\] a bug fix?

Hmm.  Good question.  I looked into this a little more, and I think I will skip 
adding new invocations of {{unwrapRemoteException}} in this patch.  The 
unwrapping is only needed when the NameNode actually throws one of those 
exceptions, but I don't think that can happen for {{getStoragePolicies}} or 
many of the other functions here.  Plus, adding that stuff muddies the 
waters... it would be better to do it in a separate patch than to combine it 
with this one.

bq. Removing checkOpen(); in delete is intentional ?

Ah, but the one-argument version of {{delete}} now calls another override of 
the function, which then calls {{checkOpen}}.  So it should be OK.

bq. Is this intentional... calling trace getCurrentEditLogTxid though its in 
getInotifyEventStream ... I suppose it is given it actually does do 
getCurrentEditLogTxid

I think we should, since we want to know about this source of activity.  We 
want to know what the performance impact of inotify is.

I also fixed a findbugs warning.  Reposting

> Add trace spans for DFSClient metadata operations
> -
>
> Key: HDFS-7189
> URL: https://issues.apache.org/jira/browse/HDFS-7189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7189.001.patch
>
>
> We should add trace spans for DFSClient metadata operations.  For example, 
> {{DFSClient#rename}} should have a trace span, etc. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160975#comment-14160975
 ] 

Colin Patrick McCabe commented on HDFS-6994:


bq. It looks to me that at least at the first phase the code will be sitting in 
contrib, compared to HADOOP-10388 is putting the code in hadoop-hdfs-project, 
they should be in completely isolation. I think it is definitely useful to 
reuse some components down the road, but I think it is a much longer term goal.

The original code in HADOOP-10388 never put any new files in the 
{{hadoop-hdfs-project}}.  Instead, it put new files in {{hadoop-native-core}}, 
a new top-level project.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160976#comment-14160976
 ] 

Yongjun Zhang commented on HDFS-7146:
-

Another thought [~aw], 

If you look at nfs code, only two platforms are currently supported: linux and 
macos. The commands used for them are crafted for differently. For example, 
getent is used for linux, and dscl and is used for mac.

Given that we have the need to use different commands for different platforms, 
if there is a new platform to be added, I would assume that likely we have to 
craft command for the new platform.  Based on this info, do you think it's ok 
for us to use "id" command (for linux and mac) will has the advantage of 
avoiding loading full user map (when there is numerical user name)? 

Thanks.





> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160974#comment-14160974
 ] 

Jason Lowe commented on HDFS-7199:
--

I believe the problem lies in the way DataStreamer is handling the error:
{code}
} catch (Throwable e) {
  // Log warning if there was a real error.
  if (restartingNodeIndex == -1) {
DFSClient.LOG.warn("DataStreamer Exception", e);
  }
  if (e instanceof IOException) {
setLastException((IOException)e);
  }
  hasError = true;
  if (errorIndex == -1 && restartingNodeIndex == -1) {
// Not a datanode issue
streamerClosed = true;
  }
}
{code}

We should either always call setLastException, wrapping the exception in an I/O 
exception if necessary, or at least set it to something if we're going to set 
streamerClosed=true and exit the datastreamer thread.  That way there will 
always be some kind of exception to be picked up either in checkClosed() or 
close() in the output stream.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Priority: Critical
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160970#comment-14160970
 ] 

Colin Patrick McCabe commented on HDFS-6994:


The libhdfs3 client clearly is very feature-complete.  It has support for 
Kerberos, Namenode HA, SASL, and so forth.  I am not going to continue 
developing the previous client as a separate project since that would be 
redundant.  Instead, we are going to work together to get libhdfs3 in shape to 
do everything the native client needs to do.  The subtasks here are a pretty 
good description of what that is.  I think "mixing with the earlier effort" is 
exactly what we need to do here, and duplicating effort is exactly what we 
shouldn't do.

The client is going to be C++ but with the existing libhdfs interfaces, so that 
it can be used with existing clients.  libhdfs3 already has these {{hdfs.h}} 
interfaces.  I would have preferred C over C\+\+, but I am not religious about 
programming languages.  I feel that if a consistent coding style can be 
enforced, C\+\+ is usable.

I am evaluating whether it is possible to make this library C\+\+11 only.  As 
[~aw] has commented, the glue code needed to support older compilers might 
become a maintenance burden over time, and Boost has its own difficult set of 
versioning issues which we would like to avoid.

In HDFS-7041, I also wrote a library called {{libhdfs_fwd}} which can perform 
the failover from the native client to the JNI client that the old HADOOP-10388 
code performed.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-06 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-7199:


 Summary: DFSOutputStream can silently drop data if DataStreamer 
crashes with a non-I/O exception
 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Priority: Critical


If the DataStreamer thread encounters a non-I/O exception then it closes the 
output stream but does not set lastException.  When the client later calls 
close on the output stream then it will see the stream is already closed with 
lastException == null, mistakently think this is a redundant close call, and 
fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160952#comment-14160952
 ] 

Yongjun Zhang commented on HDFS-7146:
-

Thanks [~aw].

Seems the requirement on user name varies. For example, I can add user with 
numerical username on my system:

[yzhang@localhost hadoop]$ su adduser 23456
su: user adduser does not exist
[yzhang@localhost hadoop]$ sudo adduser 23456
[sudo] password for yzhang: 
[yzhang@localhost hadoop]$ getent passwd | grep 23456
23456:x:504:505::/home/23456:/bin/bash
[yzhang@localhost hadoop]$ 

We had cases where use numerical user names are used often. See HDFS-4983.

I wish there is a portable command like "id" to address this issue better. 
Otherwise, we might do the following:

1. do incremental update to the map
2. do full load of passwd or group when the name is numerial

I will do some more study, comments are welcome.

Thanks.



> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160948#comment-14160948
 ] 

Haohui Mai commented on HDFS-6994:
--

bq. can you please add details on how these two stream of development will be 
brought together?

It looks to me that at least at the first phase the code will be sitting in 
contrib, compared to HADOOP-10388 is putting the code in hadoop-hdfs-project, 
they should be in completely isolation. I think it is definitely useful to 
reuse some components down the road, but I think it is a much longer term goal.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7010) boot up libhdfs3 project

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160945#comment-14160945
 ] 

Colin Patrick McCabe commented on HDFS-7010:


bq. I wonder, why the code needs to unwind the stack?

It is used to put stack traces in exceptions.

I agree that this might not really be needed and possibly we could get rid of 
it.  In my experience, getting stack unwinding code to work properly on 
multiple architectures and platforms is difficult, and the benefit seems 
uncertain since we could always add more identifying information to each 
exception to know where it came from.  [~wangzw], what do you think?

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160942#comment-14160942
 ] 

Yongjun Zhang commented on HDFS-7194:
-

Many thanks [~cmccabe] and [~szetszwo].


> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
> warning:
> {code}
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160926#comment-14160926
 ] 

Suresh Srinivas commented on HDFS-6994:
---

bq. Can the code be committed in a separate branch other than HADOOP-10388 so 
that it don't get mixed with the earlier effort?
I am curious about this too. Are there two implementations complimentary enough 
to live in HADOOP-10388 that has been in development for a long time? 
[~cmccabe] and [~wheat9], can you please add details on how these two stream of 
development will be brought together?

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160928#comment-14160928
 ] 

Colin Patrick McCabe commented on HDFS-7194:


PS: I retitled this JIRA and changed the description a bit to reflect the fact 
it no longer has to deal with the findbugs warning we fixed in HDFS-7169.

> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
> warning:
> {code}
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7194:
---
Description: 
In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
warning:
{code}
CodeWarning
Dm  org.apache.hadoop.hdfs.DFSClient.() 
invokes inefficient new String(String) constructor
{code}

  was:
In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
warnings introduced by earlier fixes.

E.g.
https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html

{quote}
Bad practice Warnings

CodeWarning
Se  Class 
org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
defines non-transient non-serializable instance field condition
Performance Warnings

CodeWarning
Dm  org.apache.hadoop.hdfs.DFSClient.() 
invokes inefficient new String(String) constructor
{quote}



> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs 
> warning:
> {code}
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7198:
---
Attachment: HDFS-7198.001.patch

> Fix or suppress findbugs "unchecked conversion" warning in 
> DFSClient#getPathTraceScope
> --
>
> Key: HDFS-7198
> URL: https://issues.apache.org/jira/browse/HDFS-7198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Trivial
> Attachments: HDFS-7198.001.patch
>
>
> Fix or suppress the findbugs "unchecked conversion" warning in 
> {{DFSClient#getPathTraceScope}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7198:
---
Status: Patch Available  (was: Open)

> Fix or suppress findbugs "unchecked conversion" warning in 
> DFSClient#getPathTraceScope
> --
>
> Key: HDFS-7198
> URL: https://issues.apache.org/jira/browse/HDFS-7198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Trivial
> Attachments: HDFS-7198.001.patch
>
>
> Fix or suppress the findbugs "unchecked conversion" warning in 
> {{DFSClient#getPathTraceScope}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160922#comment-14160922
 ] 

Colin Patrick McCabe commented on HDFS-7055:


Thanks for the report.  I have filed HDFS-7198 to remove or suppress the javac 
warning.  [~yzhangal] is tackling the findbugs warning in HDFS-7194.

> Add tracing to DFSInputStream
> -
>
> Key: HDFS-7055
> URL: https://issues.apache.org/jira/browse/HDFS-7055
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.7.0
>
> Attachments: HDFS-7055.002.patch, HDFS-7055.003.patch, 
> HDFS-7055.004.patch, HDFS-7055.005.patch, screenshot-get-1mb.005.png, 
> screenshot-get-1mb.png
>
>
> Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope

2014-10-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7198:
--

 Summary: Fix or suppress findbugs "unchecked conversion" warning 
in DFSClient#getPathTraceScope
 Key: HDFS-7198
 URL: https://issues.apache.org/jira/browse/HDFS-7198
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Trivial


Fix or suppress the findbugs "unchecked conversion" warning in 
{{DFSClient#getPathTraceScope}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160918#comment-14160918
 ] 

Colin Patrick McCabe commented on HDFS-7194:


bq. Could you check with Colin? I am not sure if he already has a plan fixing 
the findbugs and javac warnings.

I think it's fine to fix the findbugs warning in this JIRA.

I am +1 on the patch pending jenkins.

> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH

2014-10-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7194:
---
Summary: Fix findbugs "inefficient new String constructor" warning in 
DFSClient#PATH  (was: Two findbugs issues in recent PreCommit-HDFS-Build builds)

> Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
> ---
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160906#comment-14160906
 ] 

Allen Wittenauer commented on HDFS-7146:


bq. If user name 123

That's not a legal Unix user name and most/all compliant useradd's will kick it 
back as invalid. FWIW, all sorts of problems happen with all numeric usernames 
if one tries to use them.  For example, if one runs 'chown 123 file' what 
permissions would be on the file?   It's perfectly reasonable for the system to 
fail in this scenario.  

bq. About "id" command

I'm -1 on using id for this, even if it works on Linux and OS X. It limits any 
future portability to systems on SysV machines where /usr/bin/id is typically 
the SysV id and not POSIX id. We've been down this road before with id in the 
pre-security days. It was a problem then and it will be a problem in the future.

(Never mind the fact that I suspect the code actually works on other operating 
systems, but we've artificially limited it for reasons which I'm unclear on.) 

tl;dr: So use getent on everything but OS X.

> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160907#comment-14160907
 ] 

Hudson commented on HDFS-7169:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6199 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6199/])
HDFS-7169. Add SE_BAD_FIELD to findbugsExcludeFile.xml. (szetszwo: rev 
3affad9ebd7def57eb3dd1cc1a1e806fceee63ad)
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Fix a findbugs warning in ReplaceDatanodeOnFailure
> --
>
> Key: HDFS-7169
> URL: https://issues.apache.org/jira/browse/HDFS-7169
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: h7169_20140930.patch
>
>
> The following findbugs warning came up recently although there was no recent 
> change of the code.
> - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable 
> instance field condition
> Bug type SE_BAD_FIELD (click for details)
> In class 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy
> Field 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition
> In ReplaceDatanodeOnFailure.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7194:

Attachment: HDFS-7194.002.patch

> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7186) Add usage of "hadoop trace" command to doc

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160900#comment-14160900
 ] 

Colin Patrick McCabe commented on HDFS-7186:


Looks good.

{code}
+  $ hadoop trace -add -class org.htrace.impl.LocalFileSpanReceiver 
-Chadoop.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 
192.168.56.2:9000
{code}

Since the namespace of LocalFileSpanReceiver might be changing soon, I'd prefer 
to tell people to use {{hadoop trace -add -class LocalFileSpanReceiver}} (i.e., 
have the system automatically add the namespace).  That way it will work even 
after we move {{LocalFileSpanReceiver}}.

> Add usage of "hadoop trace" command to doc
> --
>
> Key: HDFS-7186
> URL: https://issues.apache.org/jira/browse/HDFS-7186
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7186-0.patch
>
>
> The command for tracing management was added in HDFS-6956.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160898#comment-14160898
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7194:
---

Could you check with Colin?  I am not sure if he already has a plan fixing the 
findbugs and javac warnings.

> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7169:
--
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Arpit for reviewing the patch.

I have committed this.

> Fix a findbugs warning in ReplaceDatanodeOnFailure
> --
>
> Key: HDFS-7169
> URL: https://issues.apache.org/jira/browse/HDFS-7169
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: h7169_20140930.patch
>
>
> The following findbugs warning came up recently although there was no recent 
> change of the code.
> - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable 
> instance field condition
> Bug type SE_BAD_FIELD (click for details)
> In class 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy
> Field 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition
> In ReplaceDatanodeOnFailure.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7169:
--
Component/s: build

> Fix a findbugs warning in ReplaceDatanodeOnFailure
> --
>
> Key: HDFS-7169
> URL: https://issues.apache.org/jira/browse/HDFS-7169
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: h7169_20140930.patch
>
>
> The following findbugs warning came up recently although there was no recent 
> change of the code.
> - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable 
> instance field condition
> Bug type SE_BAD_FIELD (click for details)
> In class 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy
> Field 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition
> In ReplaceDatanodeOnFailure.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160887#comment-14160887
 ] 

Haohui Mai commented on HDFS-6994:
--

Can the code be committed in a separate branch other than HADOOP-10388 so that 
it don't get mixed with the earlier effort?

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160880#comment-14160880
 ] 

Yongjun Zhang commented on HDFS-7194:
-

Hi [~szetszwo], 

Thanks for the info. I was not aware of HDFS-7169. We can dedicate HDFS-7194 
for the other issue reported here then. 

I'm uploading a revision to drop the change for the HDFS-7169 issue. It's going 
to be a really trivial change, would you please help reviewing?

Thanks a lot.


> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7010) boot up libhdfs3 project

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160885#comment-14160885
 ] 

Haohui Mai commented on HDFS-7010:
--

I wonder, why the code needs to unwind the stack?

{code}
+static const std::string SymbolizeAndDemangle(void * pc) {
+std::vector buffer(1024);
+std::ostringstream ss;
+uint64_t pc0 = reinterpret_cast(pc);
+uint64_t start_address = 0;
+int object_fd = OpenObjectFileContainingPcAndGetStartAddress(pc0,
+start_address);
+
+if (object_fd == -1) {
+return DEFAULT_STACK_PREFIX"Unknown";
+}
+
+FileDescriptor wrapped_object_fd(object_fd);
+int elf_type = FileGetElfType(wrapped_object_fd.get());
+
+if (elf_type == -1) {
+return DEFAULT_STACK_PREFIX"Unknown";
+}
+
+if (!GetSymbolFromObjectFile(wrapped_object_fd.get(), pc0,
+ &buffer[0], buffer.size(), start_address)) {
+return DEFAULT_STACK_PREFIX"Unknown";
+}
+
+ss << DEFAULT_STACK_PREFIX << DemangleSymbol(&buffer[0]);
+return ss.str();
+}
+
+#elif defined(OS_MACOSX) && defined(HAVE_DLADDR)
+
+static const std::string SymbolizeAndDemangle(void * pc) {
+Dl_info info;
+std::ostringstream ss;
+
+if (dladdr(pc, &info) && info.dli_sname) {
+ss << DEFAULT_STACK_PREFIX << DemangleSymbol(info.dli_sname);
+} else {
+ss << DEFAULT_STACK_PREFIX << "Unknown";
+}
+
+return ss.str();
+}
+
+#endif
+
+const std::string PrintStack(int skip, int maxDepth) {
+std::ostringstream ss;
+std::vector stack;
+GetStack(skip + 1, maxDepth, stack);
+
+for (size_t i = 0; i < stack.size(); ++i) {
+ss << SymbolizeAndDemangle(stack[i]) << std::endl;
+}
+
+return ss.str();
+}
{code}

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7187) DomainSocketWatcher thread crashes causing datanode to leak connection threads

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160860#comment-14160860
 ] 

Colin Patrick McCabe commented on HDFS-7187:


That exception occurs while {{DomainSocketWatcher}} is exiting.  It's a bug, 
but only a bug that affects shutdown (HADOOP-10404).  To find out why it's 
shutting down, you need to look backwards in the log.

> DomainSocketWatcher thread crashes causing datanode to leak connection threads
> --
>
> Key: HDFS-7187
> URL: https://issues.apache.org/jira/browse/HDFS-7187
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Maxim Ivanov
>
> It seems that DomainSocketWatcher crashes, which makes all those short 
> circuit threads to wait forever:
> {code}
> Exception in thread "Thread-22" java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1160)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:465)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> In the meantime DataXceiver threads look like this (their number grows up to 
> connection threads limit):
> {code}
> "DataXceiver for client unix:/var/run/hadoop-hdfs/datanode50010.socket 
> [Waiting for operation #1]" daemon prio=10 tid=0x7fb3c14d3800 nid=0x997e 
> waiting on condition [0x7fb2a1d25000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000744d1d600> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
> at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:386)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7174) Support for more efficient large directories

2014-10-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160847#comment-14160847
 ] 

Colin Patrick McCabe commented on HDFS-7174:


This is a great idea, [~kihwal].  Definitely something we need.

This reminds me of why [~tlipcon] created {{ChunkedArrayList}}. We found that 
block reports were generating too much garbage when they created their giant 
{{ArrayLists}}.  We had the same problem described here where resizing a giant 
{{ArrayList}} required an enormous amount of copying, and made the previous 
giant array a giant piece of garbage (which could trigger a full GC).  I was 
about to suggest using {{ChunkedArrayList}}, but I don't think it supports 
insertion into the middle of the list, unfortunately.  It might not be too hard 
to extend {{ChunkedArrayList}} to support insertion into the middle, though... 
perhaps we should consider this.

As [~hitliuyi] pointed out, the current patch has a problem.  If we go back and 
forth between {{switchingThreshold}} (say, by repeatedly adding and removing a 
single element to a directory), we pay a very high cost.  To prevent this, the 
threshold for converting a {{INodeHashedArrayList}} back to a simple 
{{INodeArrayList}} should be lower than the threshold for doing the opposite 
conversion.

I also agree with [~jingzhao] that scaling could become a problem with the 
proposed scheme, since it only has a single level of partitioning.  I guess the 
counter-argument here is that there won't be that many giant directories and 
this works for your needs.

[~shv] wrote: I am probably late to the party, but for whatever it worth.  Did 
you consider using balanced trees for inode lists, something like B-trees?

B-trees would be an excellent solution here.  Since they generally use arrays 
in the leaf nodes, this also gets you the benefits of tighter packing in 
memory.  I guess the tricky part is writing the code.

> Support for more efficient large directories
> 
>
> Key: HDFS-7174
> URL: https://issues.apache.org/jira/browse/HDFS-7174
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion 
> becomes very costly.  E.g. creating 1M entries takes 10s of minutes.  This is 
> because the complexity of an insertion is O\(n\). As the size of a list 
> grows, the overhead grows n^2. (integral of linear function).  It also causes 
> allocations and copies of big arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading

2014-10-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160845#comment-14160845
 ] 

Andrew Wang commented on HDFS-6102:
---

Hey Ravi, we could probably up this to ~6.7mil, but it seems like you'd 
probably run into this limit soon enough too. Do you mind filing a new JIRA to 
chunk up large directories? That's the only future-proof fix.

> Lower the default maximum items per directory to fix PB fsimage loading
> ---
>
> Key: HDFS-6102
> URL: https://issues.apache.org/jira/browse/HDFS-6102
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch
>
>
> Found by [~schu] during testing. We were creating a bunch of directories in a 
> single directory to blow up the fsimage size, and it ends up we hit this 
> error when trying to load a very large fsimage:
> {noformat}
> 2014-03-13 13:57:03,901 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 
> INodes.
> 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/dfs/nn/current/fsimage_00024532742, 
> cpktTxId=00024532742)
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
> at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
> at 
> com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896)
> at 52)
> ...
> {noformat}
> Some further research reveals there's a 64MB max size per PB message, which 
> seems to be what we're hitting here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160784#comment-14160784
 ] 

Tsz Wo Nicholas Sze edited comment on HDFS-7169 at 10/6/14 7:28 PM:


> -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) 
> warnings.

The new findbugs warning was introduced by HDFS-7055.


was (Author: szetszwo):
> -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) 
> warnings.

The new findbugs warning were introduced by HDFS-7055.

> Fix a findbugs warning in ReplaceDatanodeOnFailure
> --
>
> Key: HDFS-7169
> URL: https://issues.apache.org/jira/browse/HDFS-7169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h7169_20140930.patch
>
>
> The following findbugs warning came up recently although there was no recent 
> change of the code.
> - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable 
> instance field condition
> Bug type SE_BAD_FIELD (click for details)
> In class 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy
> Field 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition
> In ReplaceDatanodeOnFailure.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160786#comment-14160786
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7194:
---

We already have HDFS-7169 for ReplaceDatanodeOnFailure.

The another findbugs warnings was introduced by HDFS-7055.  It also introduced 
some javac warnings.

> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160784#comment-14160784
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7169:
---

> -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) 
> warnings.

The new findbugs warning were introduced by HDFS-7055.

> Fix a findbugs warning in ReplaceDatanodeOnFailure
> --
>
> Key: HDFS-7169
> URL: https://issues.apache.org/jira/browse/HDFS-7169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h7169_20140930.patch
>
>
> The following findbugs warning came up recently although there was no recent 
> change of the code.
> - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable 
> instance field condition
> Bug type SE_BAD_FIELD (click for details)
> In class 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy
> Field 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition
> In ReplaceDatanodeOnFailure.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream

2014-10-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160758#comment-14160758
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7055:
---

> The findbugs warnings are not related (they're for code issues that already 
> exist).

One of the findbugs warnings was from the patch as shown in the [Jenkins 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/8284/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html].
- Dmorg.apache.hadoop.hdfs.DFSClient.() 
invokes inefficient new String(String) constructor
Bug type DM_STRING_CTOR (click for details)
In class org.apache.hadoop.hdfs.DFSClient
In method org.apache.hadoop.hdfs.DFSClient.()
At DFSClient.java:\[line 3174]
{code}
private static final byte[] PATH =
  new String("path").getBytes(Charset.forName("UTF-8"));
{code}

> Meanwhile diffJavacWarnings.txt is missing, so I can't evaluate where there 
> is an additional warning or not.

Have you checked the later builds?  The 
[diffJavacWarnings.txt|https://builds.apache.org/job/PreCommit-HDFS-Build/8284/artifact/patchprocess/diffJavacWarnings.txt]
 file was available.  The javac warnings were indeed from the patch.

> Add tracing to DFSInputStream
> -
>
> Key: HDFS-7055
> URL: https://issues.apache.org/jira/browse/HDFS-7055
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.7.0
>
> Attachments: HDFS-7055.002.patch, HDFS-7055.003.patch, 
> HDFS-7055.004.patch, HDFS-7055.005.patch, screenshot-get-1mb.005.png, 
> screenshot-get-1mb.png
>
>
> Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7184) Allow data migration tool to run as a daemon

2014-10-06 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-7184:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-7197

> Allow data migration tool to run as a daemon
> 
>
> Key: HDFS-7184
> URL: https://issues.apache.org/jira/browse/HDFS-7184
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover, scripts
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>Priority: Minor
> Attachments: HDFS-7184.patch
>
>
> Just like balancer, it is sometimes required to run data migration tool in a 
> daemon mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7197) Enhancements to Data Migration Tool

2014-10-06 Thread Benoy Antony (JIRA)
Benoy Antony created HDFS-7197:
--

 Summary: Enhancements to Data Migration Tool 
 Key: HDFS-7197
 URL: https://issues.apache.org/jira/browse/HDFS-7197
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Benoy Antony
Assignee: Benoy Antony


Data migration tool (mover) was added as part of HDFS-6584.

We have been using Archival storage  tier in our clusters. We have implemented 
a similar data migration tool (Mover) to migrate data to and from Archival 
storage. This is an umbrella jira to contribute the features and improvements 
identified based on our experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading

2014-10-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160748#comment-14160748
 ] 

Ravi Prakash commented on HDFS-6102:


This is preventing log aggregation for jobs of users who run very many jobs. 
e.g. in the NodeManager logs:
{code}The directory item limit of 
//logs is exceeded: limit=1048576 
items=2144288{code}

> Lower the default maximum items per directory to fix PB fsimage loading
> ---
>
> Key: HDFS-6102
> URL: https://issues.apache.org/jira/browse/HDFS-6102
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch
>
>
> Found by [~schu] during testing. We were creating a bunch of directories in a 
> single directory to blow up the fsimage size, and it ends up we hit this 
> error when trying to load a very large fsimage:
> {noformat}
> 2014-03-13 13:57:03,901 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 
> INodes.
> 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/dfs/nn/current/fsimage_00024532742, 
> cpktTxId=00024532742)
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
> at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
> at 
> com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896)
> at 52)
> ...
> {noformat}
> Some further research reveals there's a 64MB max size per PB message, which 
> seems to be what we're hitting here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160699#comment-14160699
 ] 

Yongjun Zhang commented on HDFS-3342:
-

The findbugs issues are not introduced by the change made here. I created a 
separate jira HDFS-7194 for them. 


> SocketTimeoutException in BlockSender.sendChunks could have a better error 
> message
> --
>
> Key: HDFS-3342
> URL: https://issues.apache.org/jira/browse/HDFS-3342
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Yongjun Zhang
>Priority: Minor
> Attachments: HDFS-3342.001.patch
>
>
> Currently, if a client connects to a DN and begins to read a block, but then 
> stops calling read() for a long period of time, the DN will log a 
> SocketTimeoutException "48 millis timeout while waiting for channel to be 
> ready for write." This is because there is no "keepalive" functionality of 
> any kind. At a minimum, we should improve this error message to be an INFO 
> level log which just says that the client likely stopped reading, so 
> disconnecting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7194:

Status: Patch Available  (was: Open)

Submitted patch 001. 

Hi [~szetszwo], the change I made for the first issue reported is in on top of 
HDFS-4257 code. I see that currently we don't do serialization of 
ReplaceDatanodeOnFailure.Policy, so it should be safe to change the "condition" 
filed to transient to remove the findbugs warnding. Would you please comment in 
case I missed anything? thanks a lot.



> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds

2014-10-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7194:

Attachment: HDFS-7194.001.patch

> Two findbugs issues in recent PreCommit-HDFS-Build builds
> -
>
> Key: HDFS-7194
> URL: https://issues.apache.org/jira/browse/HDFS-7194
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
> Attachments: HDFS-7194.001.patch
>
>
> In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs 
> warnings introduced by earlier fixes.
> E.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
> {quote}
> Bad practice Warnings
> Code  Warning
> SeClass 
> org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
> defines non-transient non-serializable instance field condition
> Performance Warnings
> Code  Warning
> Dmorg.apache.hadoop.hdfs.DFSClient.() 
> invokes inefficient new String(String) constructor
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160678#comment-14160678
 ] 

Yongjun Zhang commented on HDFS-7146:
-

Hi [~aw],

Thanks for the info you provided. Here is what the comment says (man getent):

{code}
group When  no key is provided, use setgrent(3), getgrent(3), and 
endgrent(3) to enumerate the group database.  When one or more key arguments 
are provided, pass each
numeric key to getgrgid(3) and each nonnumeric key to 
getgrnam(3) and display the result.

passwdWhen no key is provided, use setpwent(3), getpwent(3), and 
endpwent(3) to enumerate the passwd database.  When one or more key arguments 
are provided, pass each
numeric key to getpwuid(3) and each nonnumeric key to 
getpwnam(3) and display the result.
{code}

If user name 123 has uid 456, and we do "getent passwd 123", it will think 123 
is uid, and search for user with uid 123, which may not exist, this is when we 
get back nothing.

About "id" command, I tested it on centos and mac (thanks for 
[~j...@cloudera.com]'s help), would you please comment whether it's good enough 
and what could be missed? The nfs code is said to support linux and mac only.

Thanks.




> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160659#comment-14160659
 ] 

Yongjun Zhang commented on HDFS-7146:
-

HI [~brandonli], thanks for your comments. I just uploaded rev 03. It works 
slightly different than what you described.

1. At initialization, the map is empty
2. Both users/groups/ids are added to the map on demand (e.g. when requested), 
3. When groupId is requested for a given groupName, if the groupName is 
numerical, the full group map is loaded (this is lazy full list load I referred 
to ealier
4. Periodically update the cached maps for both user and group. What I do here 
is actually to clear the map. I imaged that some users and groups might be 
removed (for example, a user changed job), so I instead of loading anything, I 
cleared the map during this update, essentially reinitialize the map. And then 
steps 2 and 3 will be repeated

I did not change the logic when to update the map.

Would you please take a look again to see if the change makes sense to you? 
thanks a lot.
 
 

> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7146:

Attachment: HDFS-7146.003.patch

> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, 
> HDFS-7146.003.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160587#comment-14160587
 ] 

Hadoop QA commented on HDFS-7128:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673098/HDFS-7128-2.patch
  against trunk revision ed841dd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8330//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8330//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8330//console

This message is automatically generated.

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7128-2.patch, HDFS-7128.patch
>
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pend

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-06 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160511#comment-14160511
 ] 

Milind Bhandarkar commented on HDFS-3107:
-

Dhruba, Indeed. Lack of concurrent writes to a single HDFS file means that 
there will be only a single outstanding transaction against a file (unless the 
concurrency is implemented at a higher level.) A database can consist of 
multiple files, though, and one can have multiple outstanding transactions 
against the database (one per file.) In either case, rollback is achieved by 
truncating the file to position prior to beginning of transaction.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7182) JMX metrics aren't accessible when NN is busy

2014-10-06 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160479#comment-14160479
 ] 

Ming Ma commented on HDFS-7182:
---

Thanks, Akira.

Findbugs and failed unit tests aren't related.

> JMX metrics aren't accessible when NN is busy
> -
>
> Key: HDFS-7182
> URL: https://issues.apache.org/jira/browse/HDFS-7182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7182.patch
>
>
> HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple 
> new metrics have been added. It turns out "RollingUpgradeStatus" requires 
> FSNamesystem read lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server

2014-10-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160410#comment-14160410
 ] 

Allen Wittenauer commented on HDFS-7146:


bq. For getent command, when it sees a numerical key, it thinks you are doing 
reverse lookup (from id to name). That's why it returns nothing.

Something sounds broken if it is returning nothing.  It should be able to map 
forward and reverse if things are working correctly.

The problem with using id is that output isn't exactly portable.

bq. Unfortunately there is no corresponding one for group. 

getent most definitely routes gid<->group mappings as well:

Linux:
{code}
$ getent group 1
bin:x:1:bin,daemon
$ getent group bin
bin:x:1:bin,daemon
{code}

Solaris:
{code}
$ getent group 1
other::1:root
$ getent group other
other::1:root
{code}



> NFS ID/Group lookup requires SSSD enumeration on the server
> ---
>
> Key: HDFS-7146
> URL: https://issues.apache.org/jira/browse/HDFS-7146
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch
>
>
> The current implementation of the NFS UID and GID lookup works by running 
> 'getent passwd' with an assumption that it will return the entire list of 
> users available on the OS, local and remote (AD/etc.).
> This behaviour of the command is advised to be and is prevented by 
> administrators in most secure setups to avoid excessive load to the ADs 
> involved, as the # of users to be listed may be too large, and the repeated 
> requests of ALL users not present in the cache would be too much for the AD 
> infrastructure to bear.
> The NFS server should likely do lookups based on a specific UID request, via 
> 'getent passwd ', if the UID does not match a cached value. This reduces 
> load on the LDAP backed infrastructure.
> Thanks [~qwertymaniac] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160381#comment-14160381
 ] 

Hudson commented on HDFS-6995:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1918 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1918/])
HDFS-6995. Block should be placed in the client's 'rack-local' node if 
'client-local' node is not available (vinayakumarb) (vinayakumarb: rev 
ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Block should be placed in the client's 'rack-local' node if 'client-local' 
> node is not available
> 
>
> Key: HDFS-6995
> URL: https://issues.apache.org/jira/browse/HDFS-6995
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Fix For: 2.6.0
>
> Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, 
> HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, 
> HDFS-6995-006.patch, HDFS-6995-007.patch
>
>
> HDFS cluster is rack aware.
> Client is in different node than of datanode,
> but Same rack contains one or more datanodes.
> In this case first preference should be given to select 'rack-local' node.
> Currently, since no Node in clusterMap corresponds to client's location, 
> blockplacement policy choosing a *random* node as local node and proceeding 
> for further placements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.

2014-10-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160363#comment-14160363
 ] 

Haohui Mai commented on HDFS-7193:
--

[~hitliuyi], actually, since it is already available at 
hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm, can you please 
update the patch to delete the entry in 
hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm?

> value of "dfs.webhdfs.enabled" in user doc is incorrect.
> 
>
> Key: HDFS-7193
> URL: https://issues.apache.org/jira/browse/HDFS-7193
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, webhdfs
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Trivial
> Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch
>
>
> The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not 
> _http/_HOST@REALM.TLD_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-10-06 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7128:
--
Attachment: HDFS-7128-2.patch

Here is the patch with unit test. We tested it on some large cluster. We 
decommed 10 nodes per rack from two racks.

Without the patch, it takes 174 minutes to finish block replication.
With the patch, it takes 82 minutes to finish block replication.

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7128-2.patch, HDFS-7128.patch
>
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2014-10-06 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160353#comment-14160353
 ] 

Ming Ma commented on HDFS-7009:
---

Findbugs and failed unit tests aren't related.

> Active NN and standby NN have different live nodes
> --
>
> Key: HDFS-7009
> URL: https://issues.apache.org/jira/browse/HDFS-7009
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7009-2.patch, HDFS-7009.patch
>
>
> To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
> cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
> fails, it isn't a big deal.
> However, there are cases where DN fails to register with NN during initial 
> handshake due to exceptions not covered by RPC client's connection retry. 
> When this happens, the DN won't talk to that NN until the DN restarts.
> {noformat}
> BPServiceActor
>   public void run() {
> LOG.info(this + " starting to offer service");
> try {
>   // init stuff
>   try {
> // setup storage
> connectToNNAndHandshake();
>   } catch (IOException ioe) {
> // Initial handshake, storage recovery or registration failed
> // End BPOfferService thread
> LOG.fatal("Initialization failed for block pool " + this, ioe);
> return;
>   }
>   initialized = true; // bp is initialized;
>   
>   while (shouldRun()) {
> try {
>   offerService();
> } catch (Exception ex) {
>   LOG.error("Exception in BPOfferService for " + this, ex);
>   sleepAndLogInterrupts(5000, "offering service");
> }
>   }
> ...
> {noformat}
> Here is an example of the call stack.
> {noformat}
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "xxx"; destination host is: 
> "yyy":8030;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
> at org.apache.hadoop.ipc.Client.call(Client.java:1239)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Response is null.
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
> {noformat}
> This will create discrepancy between active NN and standby NN in terms of 
> live nodes.
>  
> Here is a possible scenario of missing blocks after failover.
> 1. DN A, B set up handshakes with active NN, but not with standby NN.
> 2. A block is replicated to DN A, B and C.
> 3. From standby NN's point of view, given A and B are dead nodes, the block 
> is under replicated.
> 4. DN C is down.
> 5. Before active NN detects DN C is down, it fails over.
> 6. The new active NN considers the block is missing. Even though there are 
> two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160327#comment-14160327
 ] 

Hudson commented on HDFS-6995:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1893/])
HDFS-6995. Block should be placed in the client's 'rack-local' node if 
'client-local' node is not available (vinayakumarb) (vinayakumarb: rev 
ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java


> Block should be placed in the client's 'rack-local' node if 'client-local' 
> node is not available
> 
>
> Key: HDFS-6995
> URL: https://issues.apache.org/jira/browse/HDFS-6995
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Fix For: 2.6.0
>
> Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, 
> HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, 
> HDFS-6995-006.patch, HDFS-6995-007.patch
>
>
> HDFS cluster is rack aware.
> Client is in different node than of datanode,
> but Same rack contains one or more datanodes.
> In this case first preference should be given to select 'rack-local' node.
> Currently, since no Node in clusterMap corresponds to client's location, 
> blockplacement policy choosing a *random* node as local node and proceeding 
> for further placements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160214#comment-14160214
 ] 

Hudson commented on HDFS-6995:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #703 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/703/])
HDFS-6995. Block should be placed in the client's 'rack-local' node if 
'client-local' node is not available (vinayakumarb) (vinayakumarb: rev 
ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Block should be placed in the client's 'rack-local' node if 'client-local' 
> node is not available
> 
>
> Key: HDFS-6995
> URL: https://issues.apache.org/jira/browse/HDFS-6995
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Fix For: 2.6.0
>
> Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, 
> HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, 
> HDFS-6995-006.patch, HDFS-6995-007.patch
>
>
> HDFS cluster is rack aware.
> Client is in different node than of datanode,
> but Same rack contains one or more datanodes.
> In this case first preference should be given to select 'rack-local' node.
> Currently, since no Node in clusterMap corresponds to client's location, 
> blockplacement policy choosing a *random* node as local node and proceeding 
> for further placements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HDFS-7196) Fix several issues of hadoop security configuration in user doc.

2014-10-06 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu moved HADOOP-11164 to HDFS-7196:
---

 Component/s: (was: security)
  (was: documentation)
  security
  documentation
Target Version/s: 2.7.0  (was: 2.7.0)
 Key: HDFS-7196  (was: HADOOP-11164)
 Project: Hadoop HDFS  (was: Hadoop Common)

> Fix several issues of hadoop security configuration in user doc.
> 
>
> Key: HDFS-7196
> URL: https://issues.apache.org/jira/browse/HDFS-7196
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, security
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Trivial
>
> There are several issues of secure mode in user doc:
> {{dfs.namenode.secondary.keytab.file}} should be 
> {{dfs.secondary.namenode.keytab.file}}, 
> {{dfs.namenode.secondary.kerberos.principal}} should be 
> {{dfs.secondary.namenode.kerberos.principal}}.
> {{dfs.namenode.kerberos.https.principal}} doesn't exist, it should be 
> {{dfs.namenode.kerberos.internal.spnego.principal}}.
> {{dfs.namenode.secondary.kerberos.https.principal}} doesn't exist, it should 
> be {{dfs.secondary.namenode.kerberos.internal.spnego.principal}}.
> {{dfs.datanode.kerberos.https.principal}} doesn't exist, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7195) Update user doc of secure mode about Datanodes don't require root or jsvc

2014-10-06 Thread Yi Liu (JIRA)
Yi Liu created HDFS-7195:


 Summary: Update user doc of secure mode about Datanodes don't 
require root or jsvc
 Key: HDFS-7195
 URL: https://issues.apache.org/jira/browse/HDFS-7195
 Project: Hadoop HDFS
  Issue Type: Task
  Components: security
Reporter: Yi Liu
Assignee: Yi Liu


HDFS-2856 adds support that Datanodes don't require root or jsvc. If 
{{dfs.data.transfer.protection}} is configured and {{dfs.http.policy}} is 
_HTTPS_ONLY_, then secure dataNode doesn't need to use privileged port.

This has not been updated in the latest user doc of secure mode. This JIRA is 
to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7186) Add usage of "hadoop trace" command to doc

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160153#comment-14160153
 ] 

Hadoop QA commented on HDFS-7186:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673070/HDFS-7186-0.patch
  against trunk revision ed841dd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8329//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8329//console

This message is automatically generated.

> Add usage of "hadoop trace" command to doc
> --
>
> Key: HDFS-7186
> URL: https://issues.apache.org/jira/browse/HDFS-7186
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7186-0.patch
>
>
> The command for tracing management was added in HDFS-6956.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160126#comment-14160126
 ] 

Hudson commented on HDFS-6995:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6196 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6196/])
HDFS-6995. Block should be placed in the client's 'rack-local' node if 
'client-local' node is not available (vinayakumarb) (vinayakumarb: rev 
ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Block should be placed in the client's 'rack-local' node if 'client-local' 
> node is not available
> 
>
> Key: HDFS-6995
> URL: https://issues.apache.org/jira/browse/HDFS-6995
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Fix For: 2.6.0
>
> Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, 
> HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, 
> HDFS-6995-006.patch, HDFS-6995-007.patch
>
>
> HDFS cluster is rack aware.
> Client is in different node than of datanode,
> but Same rack contains one or more datanodes.
> In this case first preference should be given to select 'rack-local' node.
> Currently, since no Node in clusterMap corresponds to client's location, 
> blockplacement policy choosing a *random* node as local node and proceeding 
> for further placements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7186) Add usage of "hadoop trace" command to doc

2014-10-06 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7186:
---
Attachment: HDFS-7186-0.patch

This patch includes the trivial fixes to follow the change in HDFS-7055.

> Add usage of "hadoop trace" command to doc
> --
>
> Key: HDFS-7186
> URL: https://issues.apache.org/jira/browse/HDFS-7186
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7186-0.patch
>
>
> The command for tracing management was added in HDFS-6956.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7186) Add usage of "hadoop trace" command to doc

2014-10-06 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7186:
---
Status: Patch Available  (was: Open)

> Add usage of "hadoop trace" command to doc
> --
>
> Key: HDFS-7186
> URL: https://issues.apache.org/jira/browse/HDFS-7186
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7186-0.patch
>
>
> The command for tracing management was added in HDFS-6956.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available

2014-10-06 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6995:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

committed to trunk and branch-2.

Thanks [~umamaheswararao] and [~hitliuyi] for the reviews

> Block should be placed in the client's 'rack-local' node if 'client-local' 
> node is not available
> 
>
> Key: HDFS-6995
> URL: https://issues.apache.org/jira/browse/HDFS-6995
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Fix For: 2.6.0
>
> Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, 
> HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, 
> HDFS-6995-006.patch, HDFS-6995-007.patch
>
>
> HDFS cluster is rack aware.
> Client is in different node than of datanode,
> but Same rack contains one or more datanodes.
> In this case first preference should be given to select 'rack-local' node.
> Currently, since no Node in clusterMap corresponds to client's location, 
> blockplacement policy choosing a *random* node as local node and proceeding 
> for further placements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >