[jira] [Updated] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk
[ https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7112: - Attachment: HDFS-7112.2.patch Can't repro the build failure from Jenkins. Update the patch. > LazyWriter should use either async IO or one thread per physical disk > - > > Key: HDFS-7112 > URL: https://issues.apache.org/jira/browse/HDFS-7112 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: HDFS-6581 >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao > Fix For: 2.6.0 > > Attachments: HDFS-7112.0.patch, HDFS-7112.1.patch, HDFS-7112.2.patch > > > The LazyWriter currently uses synchronous IO and a single thread. This limits > the throughput to that of a single disk. Using either async overlapped IO or > one thread per physical disk will improve the write throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161454#comment-14161454 ] Haohui Mai commented on HDFS-6994: -- bq. I'm going to get rid of the code in hadoop-native-core (I guess I should put up a change for that) since this directory isn't needed any longer. I should have clarify before. What I meant "merge" means "merging to trunk" :-) Are you suggesting to deprecate hadoop-native-core before it get merged? If so, given the fact that there are way less resolved tasks in HDFS-6994 compared to HADOOP-10388 (5 vs 22), maybe it is more appropriate to start a new branch with the five resolved tasks we have so far? Having a clean branch allows us to preserve the development history more easily when the branch is merged into trunk, thanks to the recent switch to git. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7014) Implement input and output streams to DataNode for native client
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161434#comment-14161434 ] Haohui Mai commented on HDFS-7014: -- I appreciate if the patch can be further split into smaller pieces. Is it possible take out FileSystem / BlockReader / LeaseRenewer into separate patches? > Implement input and output streams to DataNode for native client > > > Key: HDFS-7014 > URL: https://issues.apache.org/jira/browse/HDFS-7014 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Colin Patrick McCabe > Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch > > > Implement Client - Namenode RPC protocol and support Namenode HA. > Implement Client - Datanode RPC protocol. > Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.
[ https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161431#comment-14161431 ] Haohui Mai commented on HDFS-7193: -- +1 > value of "dfs.webhdfs.enabled" in user doc is incorrect. > > > Key: HDFS-7193 > URL: https://issues.apache.org/jira/browse/HDFS-7193 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, webhdfs >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Trivial > Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch, > HDFS-7193.003.patch > > > The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not > _http/_HOST@REALM.TLD_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope
[ https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161343#comment-14161343 ] Hadoop QA commented on HDFS-7198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673194/HDFS-7198.001.patch against trunk revision 8dc6abf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8336//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8336//console This message is automatically generated. > Fix or suppress findbugs "unchecked conversion" warning in > DFSClient#getPathTraceScope > -- > > Key: HDFS-7198 > URL: https://issues.apache.org/jira/browse/HDFS-7198 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Trivial > Attachments: HDFS-7198.001.patch > > > Fix or suppress the findbugs "unchecked conversion" warning in > {{DFSClient#getPathTraceScope}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161322#comment-14161322 ] Colin Patrick McCabe commented on HDFS-6994: bq. Agree. However, I think when the time of call for merge comes, it requires the reviewers to look at both sides of the code. Separating it into another branch would make things much easier and allow ensuring better code quality. I think there is some misunderstanding here. There isn't going to be a single "time of merge." Zhanwei, Abe and I are fixing up libhdfs3 as we go to have the functionality it needs. In other words merging the functionality now, not later. I'm going to get rid of the code in hadoop-native-core (I guess I should put up a change for that) since this directory isn't needed any longer. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7014) Implement input and output streams to DataNode for native client
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161316#comment-14161316 ] Colin Patrick McCabe commented on HDFS-7014: I kept around the base class for NamenodeProxy. I can see now why it's needed in the failover logic. I think that the InputStreamInter, etc. things are not needed, though, so I removed those. Fixed a bunch more super-long lines. Give this a review when you can! It would be nice to get all this code in so we can start tackling things like HDFS-7023. > Implement input and output streams to DataNode for native client > > > Key: HDFS-7014 > URL: https://issues.apache.org/jira/browse/HDFS-7014 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Colin Patrick McCabe > Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch > > > Implement Client - Namenode RPC protocol and support Namenode HA. > Implement Client - Datanode RPC protocol. > Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7014) Implement input and output streams to DataNode for native client
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7014 started by Colin Patrick McCabe. -- > Implement input and output streams to DataNode for native client > > > Key: HDFS-7014 > URL: https://issues.apache.org/jira/browse/HDFS-7014 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Colin Patrick McCabe > Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch > > > Implement Client - Namenode RPC protocol and support Namenode HA. > Implement Client - Datanode RPC protocol. > Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7014) Implement input and output streams to DataNode for native client
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7014: --- Attachment: 0001-HDFS-7014-001.patch > Implement input and output streams to DataNode for native client > > > Key: HDFS-7014 > URL: https://issues.apache.org/jira/browse/HDFS-7014 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Colin Patrick McCabe > Attachments: 0001-HDFS-7014-001.patch, HDFS-7014.patch > > > Implement Client - Namenode RPC protocol and support Namenode HA. > Implement Client - Datanode RPC protocol. > Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7174) Support for more efficient large directories
[ https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161304#comment-14161304 ] Yi Liu commented on HDFS-7174: -- {quote} Konstantin Shvachko wrote: I am probably late to the party, but for whatever it worth. Did you consider using balanced trees for inode lists, something like B-trees? B-trees would be an excellent solution here. Since they generally use arrays in the leaf nodes, this also gets you the benefits of tighter packing in memory. I guess the tricky part is writing the code. {quote} Good point, agree. We should be careful about the memory usage during implementation. > Support for more efficient large directories > > > Key: HDFS-7174 > URL: https://issues.apache.org/jira/browse/HDFS-7174 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch > > > When the number of children under a directory grows very large, insertion > becomes very costly. E.g. creating 1M entries takes 10s of minutes. This is > because the complexity of an insertion is O\(n\). As the size of a list > grows, the overhead grows n^2. (integral of linear function). It also causes > allocations and copies of big arrays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-3107: --- Attachment: HDFS-3107.patch > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-3107: --- Attachment: editsStored.xml editsStored Refreshing patch. Made a couple changes: # TruncateOp now acts like an AddOp (it opens the file for write). # Added updateSpaceConsumed() call in FSDirectory.unprotectedTruncate(). # Replaced asserts in tests with assertThat() calls. # Also attaching new editsStored and editsStored.xml files. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.
[ https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161298#comment-14161298 ] Hadoop QA commented on HDFS-7193: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673245/HDFS-7193.003.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8339//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8339//console This message is automatically generated. > value of "dfs.webhdfs.enabled" in user doc is incorrect. > > > Key: HDFS-7193 > URL: https://issues.apache.org/jira/browse/HDFS-7193 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, webhdfs >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Trivial > Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch, > HDFS-7193.003.patch > > > The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not > _http/_HOST@REALM.TLD_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7185) The active NameNode will not accept an fsimage sent from the standby during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161286#comment-14161286 ] Jing Zhao commented on HDFS-7185: - Hi Colin, one question is that is the scenario where we hit this exception is only when we have upgraded the SBN with the new version of the software, while still leaving the ANN running with the old bits? If this is the case, to have this exception should be the correct behavior. This is because if we allow a checkpoint to happen at this time, a fsimage written by new bits is uploaded into the ANN, which may not be understood by the old software. Then we cannot normally restart the original ANN until we also upgrade it to the new version. > The active NameNode will not accept an fsimage sent from the standby during > rolling upgrade > --- > > Key: HDFS-7185 > URL: https://issues.apache.org/jira/browse/HDFS-7185 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe > > The active NameNode will not accept an fsimage sent from the standby during > rolling upgrade. The active fails with the exception: > {code} > 18:25:07,620 WARN ImageServlet:198 - Received an invalid request file > transfer request from a secondary with storage info > -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 > 18:25:07,620 WARN log:76 - Committed before 410 PutImage failed. > java.io.IOException: This namenode has storage info > -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary > expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d- > 0a6e431987f6 > at > org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200) > at > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:730) > {code} > On the standby, the exception is: > {code} > java.io.IOException: Exception during image upload: > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: > This namenode has storage info > -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary > expected > -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62) > {code} > This seems to be a consequence of the fact that the VERSION file still is at > -55 (the old version) even after the rolling upgrade has started. When the > rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}}, > both VERSION files get set to the new version, and the problem goes away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk
[ https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161257#comment-14161257 ] Hadoop QA commented on HDFS-7112: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673243/HDFS-7112.1.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8338//console This message is automatically generated. > LazyWriter should use either async IO or one thread per physical disk > - > > Key: HDFS-7112 > URL: https://issues.apache.org/jira/browse/HDFS-7112 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: HDFS-6581 >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao > Fix For: 2.6.0 > > Attachments: HDFS-7112.0.patch, HDFS-7112.1.patch > > > The LazyWriter currently uses synchronous IO and a single thread. This limits > the throughput to that of a single disk. Using either async overlapped IO or > one thread per physical disk will improve the write throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.
[ https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7193: - Attachment: HDFS-7193.003.patch Haohui, agree with that we can remove it since it already exists in Web HDFS doc, thanks. Just update the patch. > value of "dfs.webhdfs.enabled" in user doc is incorrect. > > > Key: HDFS-7193 > URL: https://issues.apache.org/jira/browse/HDFS-7193 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, webhdfs >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Trivial > Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch, > HDFS-7193.003.patch > > > The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not > _http/_HOST@REALM.TLD_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk
[ https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7112: - Attachment: HDFS-7112.1.patch > LazyWriter should use either async IO or one thread per physical disk > - > > Key: HDFS-7112 > URL: https://issues.apache.org/jira/browse/HDFS-7112 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: HDFS-6581 >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao > Fix For: 2.6.0 > > Attachments: HDFS-7112.0.patch, HDFS-7112.1.patch > > > The LazyWriter currently uses synchronous IO and a single thread. This limits > the throughput to that of a single disk. Using either async overlapped IO or > one thread per physical disk will improve the write throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end
[ https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161219#comment-14161219 ] Chris Nauroth commented on HDFS-7128: - That's some great analysis, [~mingma]. +1 for the patch. The findbugs and test failures look unrelated. [~kihwal], I'll hold off committing until tomorrow in case you have further feedback. > Decommission slows way down when it gets towards the end > > > Key: HDFS-7128 > URL: https://issues.apache.org/jira/browse/HDFS-7128 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7128-2.patch, HDFS-7128.patch > > > When we decommission nodes across different racks, the decommission process > becomes really slow at the end, hardly making any progress. The problem is > some blocks are on 3 decomm-in-progress DNs and the way how replications are > scheduled caused unnecessary delay. Here is the analysis. > When BlockManager schedules the replication work from neededReplication, it > first needs to pick the source node for replication via chooseSourceDatanode. > The core policies to pick the source node are: > 1. Prefer decomm-in-progress node. > 2. Only pick the nodes whose outstanding replication counts are below > thresholds dfs.namenode.replication.max-streams or > dfs.namenode.replication.max-streams-hard-limit, based on the replication > priority. > When we decommission nodes, > 1. All the decommission nodes' blocks will be added to neededReplication. > 2. BM will pick X number of blocks from neededReplication in each iteration. > X is based on cluster size and some configurable multiplier. So if the > cluster has 2000 nodes, X will be around 4000. > 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up > being chosen as the source node of all these 4000 nodes. The reason the > outstanding replication thresholds don't kick is due to the implementation of > BlockManager.computeReplicationWorkForBlocks; > node.getNumberOfBlocksToBeReplicated() remains zero given > node.addBlockToBeReplicated is called after source node iteration. > {noformat} > ... > synchronized (neededReplications) { > for (int priority = 0; priority < blocksToReplicate.size(); > priority++) { > ... > chooseSourceDatanode > ... > } > for(ReplicationWork rw : work){ > ... > rw.srcNode.addBlockToBeReplicated(block, targets); > ... > } > {noformat} > > 4. So several decomm-in-progress nodes A, B, C end up with 4000 > node.getNumberOfBlocksToBeReplicated(). > 5. If we assume each node can replicate 5 blocks per minutes, it is going to > take 800 minutes to finish replication of these blocks. > 6. Pending replication timeout kick in after 5 minutes. The items will be > removed from the pending replication queue and added back to > neededReplication. The replications will then be handled by other source > nodes of these blocks. But the blocks still remain in nodes A, B, C's pending > replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue > the replications of these blocks, although these blocks might have been > replicated by other DNs after replication timeout. > 7. Some block' replicas exist on A, B, C and it is at the end of A's pending > replication queue. Even though the block's replication timeout, no source > node can be chosen given A, B, C all have high pending replication count. So > we have to wait until A drains its pending replication queue. Meanwhile, the > items in A's pending replication queue have been taken care of by other nodes > and no longer under replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161202#comment-14161202 ] Haohui Mai commented on HDFS-6994: -- bq. I think "mixing with the earlier effort" is exactly what we need to do here, and duplicating effort is exactly what we shouldn't do. Agree. However, I think when the time of call for merge comes, it requires the reviewers to look at both sides of the code. Separating it into another branch would make things much easier and allow ensuring better code quality. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7112) LazyWriter should use either async IO or one thread per physical disk
[ https://issues.apache.org/jira/browse/HDFS-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161190#comment-14161190 ] Xiaoyu Yao commented on HDFS-7112: -- Thanks [~arpitagarwal] for reviewing the patch. # FsDatasetImpl.java: We should avoid adding volumes to asyncDiskService unless there is a RAM_DISK volume configured. This avoids creating unnecessary thread pools on most deployments which will not have RAM_DISK. Do you mean adding volumes to asyncLazyPersistService? For asyncLazyPersistService, we will only call asyncLazyPersistService#addVolume when the storage type of the volume is RAM_DISK. # We don't need the counter and logging in {{addExecutorForVolume}}. We will never add more than one thread per volume. Fixed. # {{onStartLazyPersist}} should be called by {{saveNextReplica}} before it calls {{submitLazyPersistTask}}. Fixed. # {{onFailLazyPersist}} should never be called unless {{onStartLazyPersist}} has been called. I think it can be called on some failure paths even if submitLazyPersistTask was not called. onFailLazyPersist should be called upon any failure after the block is dequeued with the following call. block = ramDiskReplicaTracker.dequeueNextReplicaToPersist(); The patch calls onFailLazyPersist in two error paths. 1) Failed to submit the request to thread pool when calling submitLazyPersistTask 2) Failed during the thread pool thread execution. # {{BlockPoolSlice#lazyPersistReplica}} is unused, can be removed. Removed. # {{RamDiskAsyncLazyPersistService#countPendingTasks}} is unused. Removed. # It would be good if RamDiskAsyncLazyPersistService did not have a dependency on DataNode/FsDatasetImpl. It can accept the success and failure callbacks as parameters. But it's okay to fix it later in a separate Jira. That’s a good idea. I will file a separate JIRA for that. > LazyWriter should use either async IO or one thread per physical disk > - > > Key: HDFS-7112 > URL: https://issues.apache.org/jira/browse/HDFS-7112 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: HDFS-6581 >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao > Fix For: 2.6.0 > > Attachments: HDFS-7112.0.patch > > > The LazyWriter currently uses synchronous IO and a single thread. This limits > the throughput to that of a single disk. Using either async overlapped IO or > one thread per physical disk will improve the write throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope
[ https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161156#comment-14161156 ] Colin Patrick McCabe commented on HDFS-7198: The "new" findbugs warning here is just a repeat of HDFS-7194, which is interesting. Since that warning was just fixed, I re-kicked the build. > Fix or suppress findbugs "unchecked conversion" warning in > DFSClient#getPathTraceScope > -- > > Key: HDFS-7198 > URL: https://issues.apache.org/jira/browse/HDFS-7198 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Trivial > Attachments: HDFS-7198.001.patch > > > Fix or suppress the findbugs "unchecked conversion" warning in > {{DFSClient#getPathTraceScope}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161145#comment-14161145 ] Hadoop QA commented on HDFS-7189: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673205/HDFS-7189.003.patch against trunk revision 8dc6abf. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8337//console This message is automatically generated. > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161137#comment-14161137 ] Colin Patrick McCabe commented on HDFS-7189: Build failed due to BUILDS-26, retriggering > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161136#comment-14161136 ] Hudson commented on HDFS-7194: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6201 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6201/]) HDFS-7194 Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH (yzhang via cmccabe) (cmccabe: rev 8dc6abf2f4218b2d84b2c2dc7d18623d897c362d) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs > warning: > {code} > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7194: --- Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: 2.7.0 Status: Resolved (was: Patch Available) thanks, Yongjun. > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs > warning: > {code} > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161105#comment-14161105 ] Yongjun Zhang commented on HDFS-7146: - Hi [~aw], About the username pattern allowed on different platforms, there were discussion in HDFS-4983 and HDFS-4733: {quote} Alejandro Abdelnur added a comment - 04/Dec/13 17:01 Allowed usernames are the OS allowed user names. Different versions of Unix/Linux have different restrictions by default. This was discussed when this was done for httpfs. Refer to HDFS-4733 for details. {quote} I agree with you that ideally all allowed usernames would comply with the same convention, that would make our life much easier. However, if user already had the numerical usernames, we probably have to support. To ask them to change user name is going to be much harder than for us to support it:-) That's what HDFS-4983 and HDFS-4733 about. Would you please also address the questions I asked in "Another thought Allen Wittenauer," comment above? Thanks a lot. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7201) Fix typos in hdfs-default.xml
Konstantin Shvachko created HDFS-7201: - Summary: Fix typos in hdfs-default.xml Key: HDFS-7201 URL: https://issues.apache.org/jira/browse/HDFS-7201 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Found the following typos in hdfs-default.xml: repliaction directoires teh tranfer spage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161043#comment-14161043 ] Hadoop QA commented on HDFS-7194: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673186/HDFS-7194.002.patch against trunk revision 3affad9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 12 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/8333//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8333//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8333//console This message is automatically generated. > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs > warning: > {code} > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope
[ https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161042#comment-14161042 ] Hadoop QA commented on HDFS-7198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673194/HDFS-7198.001.patch against trunk revision 3affad9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8334//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8334//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8334//console This message is automatically generated. > Fix or suppress findbugs "unchecked conversion" warning in > DFSClient#getPathTraceScope > -- > > Key: HDFS-7198 > URL: https://issues.apache.org/jira/browse/HDFS-7198 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Trivial > Attachments: HDFS-7198.001.patch > > > Fix or suppress the findbugs "unchecked conversion" warning in > {{DFSClient#getPathTraceScope}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161039#comment-14161039 ] Hadoop QA commented on HDFS-7189: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673205/HDFS-7189.003.patch against trunk revision 8099de2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8335//console This message is automatically generated. > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7200) Rename libhdfs3 to libndfs++
Colin Patrick McCabe created HDFS-7200: -- Summary: Rename libhdfs3 to libndfs++ Key: HDFS-7200 URL: https://issues.apache.org/jira/browse/HDFS-7200 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Since we generally agree that libhdfs3 is a sub-optimal name, let's call the new library "libndfs++." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161021#comment-14161021 ] dhruba borthakur commented on HDFS-3107: Thanks for the clarification Milind. I was just making sure that i understand the limitations of such a database the uses the HDFS truncate feature. Given this fact, it is unlikely that HBase can use it (in future) to support transactions. Thanks anyways. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161011#comment-14161011 ] Subbu commented on HDFS-7175: - I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files). > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Carl Steinbach >Assignee: Akira AJISAKA > Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message
[ https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-3342: Labels: supportability (was: ) > SocketTimeoutException in BlockSender.sendChunks could have a better error > message > -- > > Key: HDFS-3342 > URL: https://issues.apache.org/jira/browse/HDFS-3342 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Yongjun Zhang >Priority: Minor > Labels: supportability > Attachments: HDFS-3342.001.patch > > > Currently, if a client connects to a DN and begins to read a block, but then > stops calling read() for a long period of time, the DN will log a > SocketTimeoutException "48 millis timeout while waiting for channel to be > ready for write." This is because there is no "keepalive" functionality of > any kind. At a minimum, we should improve this error message to be an INFO > level log which just says that the client likely stopped reading, so > disconnecting it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7189: --- Attachment: HDFS-7189.003.patch fix typo > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7189: --- Attachment: (was: HDFS-7189.002.patch) > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch, HDFS-7189.003.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161006#comment-14161006 ] Allen Wittenauer commented on HDFS-7146: bg. See HDFS-4983. That JIRA is sort of irrelevant to the discussion since HDFS (and therefore WebHDFS) has no such restrictions on usernames since they are published as strings. Unix does and we have to play by its rules since that's the space this code plays. bq. Seems the requirement on user name varies. Not really. Some useradd's do not enforce the entire rule set, which is why I said "most/all". Some Linux distributions include a useradd facility that do not. If you look at the upstream Linux shadow utilities source, however, (https://github.com/shadow-maint/shadow/blob/master/libmisc/chkname.c) you'll find that all digit usernames are not legal. Other OSes follow similar rules in their utilities ( e.g., Illumos: https://hg.openindiana.org/upstream/illumos/illumos-gate/file/68f95e015346/usr/src/cmd/aset/tasks/pwchk.awk ). Just because some distributions allowed users to do incredibly dumb things doesn't mean we need to as well. FWIW, if you want true portability, you'll need to use the native system calls to follow whatever rules are allowed on that machine. Otherwise, expect to make some compatibility decisions. To me, this is an easy call: all numeric usernames are super rare since they have unpredictable results (e.g., chown). portability > naive admins who shot themselves in the foot. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7189: --- Attachment: HDFS-7189.002.patch > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch, HDFS-7189.002.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7189) Add trace spans for DFSClient metadata operations
[ https://issues.apache.org/jira/browse/HDFS-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160998#comment-14160998 ] Colin Patrick McCabe commented on HDFS-7189: bq. \[is adding unwrapRemoteException to getStoragePolicies\] a bug fix? Hmm. Good question. I looked into this a little more, and I think I will skip adding new invocations of {{unwrapRemoteException}} in this patch. The unwrapping is only needed when the NameNode actually throws one of those exceptions, but I don't think that can happen for {{getStoragePolicies}} or many of the other functions here. Plus, adding that stuff muddies the waters... it would be better to do it in a separate patch than to combine it with this one. bq. Removing checkOpen(); in delete is intentional ? Ah, but the one-argument version of {{delete}} now calls another override of the function, which then calls {{checkOpen}}. So it should be OK. bq. Is this intentional... calling trace getCurrentEditLogTxid though its in getInotifyEventStream ... I suppose it is given it actually does do getCurrentEditLogTxid I think we should, since we want to know about this source of activity. We want to know what the performance impact of inotify is. I also fixed a findbugs warning. Reposting > Add trace spans for DFSClient metadata operations > - > > Key: HDFS-7189 > URL: https://issues.apache.org/jira/browse/HDFS-7189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7189.001.patch > > > We should add trace spans for DFSClient metadata operations. For example, > {{DFSClient#rename}} should have a trace span, etc. etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160975#comment-14160975 ] Colin Patrick McCabe commented on HDFS-6994: bq. It looks to me that at least at the first phase the code will be sitting in contrib, compared to HADOOP-10388 is putting the code in hadoop-hdfs-project, they should be in completely isolation. I think it is definitely useful to reuse some components down the road, but I think it is a much longer term goal. The original code in HADOOP-10388 never put any new files in the {{hadoop-hdfs-project}}. Instead, it put new files in {{hadoop-native-core}}, a new top-level project. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160976#comment-14160976 ] Yongjun Zhang commented on HDFS-7146: - Another thought [~aw], If you look at nfs code, only two platforms are currently supported: linux and macos. The commands used for them are crafted for differently. For example, getent is used for linux, and dscl and is used for mac. Given that we have the need to use different commands for different platforms, if there is a new platform to be added, I would assume that likely we have to craft command for the new platform. Based on this info, do you think it's ok for us to use "id" command (for linux and mac) will has the advantage of avoiding loading full user map (when there is numerical user name)? Thanks. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160974#comment-14160974 ] Jason Lowe commented on HDFS-7199: -- I believe the problem lies in the way DataStreamer is handling the error: {code} } catch (Throwable e) { // Log warning if there was a real error. if (restartingNodeIndex == -1) { DFSClient.LOG.warn("DataStreamer Exception", e); } if (e instanceof IOException) { setLastException((IOException)e); } hasError = true; if (errorIndex == -1 && restartingNodeIndex == -1) { // Not a datanode issue streamerClosed = true; } } {code} We should either always call setLastException, wrapping the exception in an I/O exception if necessary, or at least set it to something if we're going to set streamerClosed=true and exit the datastreamer thread. That way there will always be some kind of exception to be picked up either in checkClosed() or close() in the output stream. > DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O > exception > --- > > Key: HDFS-7199 > URL: https://issues.apache.org/jira/browse/HDFS-7199 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Priority: Critical > > If the DataStreamer thread encounters a non-I/O exception then it closes the > output stream but does not set lastException. When the client later calls > close on the output stream then it will see the stream is already closed with > lastException == null, mistakently think this is a redundant close call, and > fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160970#comment-14160970 ] Colin Patrick McCabe commented on HDFS-6994: The libhdfs3 client clearly is very feature-complete. It has support for Kerberos, Namenode HA, SASL, and so forth. I am not going to continue developing the previous client as a separate project since that would be redundant. Instead, we are going to work together to get libhdfs3 in shape to do everything the native client needs to do. The subtasks here are a pretty good description of what that is. I think "mixing with the earlier effort" is exactly what we need to do here, and duplicating effort is exactly what we shouldn't do. The client is going to be C++ but with the existing libhdfs interfaces, so that it can be used with existing clients. libhdfs3 already has these {{hdfs.h}} interfaces. I would have preferred C over C\+\+, but I am not religious about programming languages. I feel that if a consistent coding style can be enforced, C\+\+ is usable. I am evaluating whether it is possible to make this library C\+\+11 only. As [~aw] has commented, the glue code needed to support older compilers might become a maintenance burden over time, and Boost has its own difficult set of versioning issues which we would like to avoid. In HDFS-7041, I also wrote a library called {{libhdfs_fwd}} which can perform the failover from the native client to the JNI client that the old HADOOP-10388 code performed. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
Jason Lowe created HDFS-7199: Summary: DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Priority: Critical If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160952#comment-14160952 ] Yongjun Zhang commented on HDFS-7146: - Thanks [~aw]. Seems the requirement on user name varies. For example, I can add user with numerical username on my system: [yzhang@localhost hadoop]$ su adduser 23456 su: user adduser does not exist [yzhang@localhost hadoop]$ sudo adduser 23456 [sudo] password for yzhang: [yzhang@localhost hadoop]$ getent passwd | grep 23456 23456:x:504:505::/home/23456:/bin/bash [yzhang@localhost hadoop]$ We had cases where use numerical user names are used often. See HDFS-4983. I wish there is a portable command like "id" to address this issue better. Otherwise, we might do the following: 1. do incremental update to the map 2. do full load of passwd or group when the name is numerial I will do some more study, comments are welcome. Thanks. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160948#comment-14160948 ] Haohui Mai commented on HDFS-6994: -- bq. can you please add details on how these two stream of development will be brought together? It looks to me that at least at the first phase the code will be sitting in contrib, compared to HADOOP-10388 is putting the code in hadoop-hdfs-project, they should be in completely isolation. I think it is definitely useful to reuse some components down the road, but I think it is a much longer term goal. > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7010) boot up libhdfs3 project
[ https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160945#comment-14160945 ] Colin Patrick McCabe commented on HDFS-7010: bq. I wonder, why the code needs to unwind the stack? It is used to put stack traces in exceptions. I agree that this might not really be needed and possibly we could get rid of it. In my experience, getting stack unwinding code to work properly on multiple architectures and platforms is difficult, and the benefit seems uncertain since we could always add more identifying information to each exception to know where it came from. [~wangzw], what do you think? > boot up libhdfs3 project > > > Key: HDFS-7010 > URL: https://issues.apache.org/jira/browse/HDFS-7010 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Colin Patrick McCabe > Attachments: HDFS-7010-pnative.003.patch, > HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch > > > boot up libhdfs3 project with CMake, Readme and license file. > Integrate google mock and google test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160942#comment-14160942 ] Yongjun Zhang commented on HDFS-7194: - Many thanks [~cmccabe] and [~szetszwo]. > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs > warning: > {code} > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160926#comment-14160926 ] Suresh Srinivas commented on HDFS-6994: --- bq. Can the code be committed in a separate branch other than HADOOP-10388 so that it don't get mixed with the earlier effort? I am curious about this too. Are there two implementations complimentary enough to live in HADOOP-10388 that has been in development for a long time? [~cmccabe] and [~wheat9], can you please add details on how these two stream of development will be brought together? > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160928#comment-14160928 ] Colin Patrick McCabe commented on HDFS-7194: PS: I retitled this JIRA and changed the description a bit to reflect the fact it no longer has to deal with the findbugs warning we fixed in HDFS-7169. > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs > warning: > {code} > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7194: --- Description: In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs warning: {code} CodeWarning Dm org.apache.hadoop.hdfs.DFSClient.() invokes inefficient new String(String) constructor {code} was: In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs warnings introduced by earlier fixes. E.g. https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html {quote} Bad practice Warnings CodeWarning Se Class org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable instance field condition Performance Warnings CodeWarning Dm org.apache.hadoop.hdfs.DFSClient.() invokes inefficient new String(String) constructor {quote} > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there is a findbugs > warning: > {code} > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope
[ https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7198: --- Attachment: HDFS-7198.001.patch > Fix or suppress findbugs "unchecked conversion" warning in > DFSClient#getPathTraceScope > -- > > Key: HDFS-7198 > URL: https://issues.apache.org/jira/browse/HDFS-7198 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Trivial > Attachments: HDFS-7198.001.patch > > > Fix or suppress the findbugs "unchecked conversion" warning in > {{DFSClient#getPathTraceScope}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope
[ https://issues.apache.org/jira/browse/HDFS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7198: --- Status: Patch Available (was: Open) > Fix or suppress findbugs "unchecked conversion" warning in > DFSClient#getPathTraceScope > -- > > Key: HDFS-7198 > URL: https://issues.apache.org/jira/browse/HDFS-7198 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Trivial > Attachments: HDFS-7198.001.patch > > > Fix or suppress the findbugs "unchecked conversion" warning in > {{DFSClient#getPathTraceScope}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160922#comment-14160922 ] Colin Patrick McCabe commented on HDFS-7055: Thanks for the report. I have filed HDFS-7198 to remove or suppress the javac warning. [~yzhangal] is tackling the findbugs warning in HDFS-7194. > Add tracing to DFSInputStream > - > > Key: HDFS-7055 > URL: https://issues.apache.org/jira/browse/HDFS-7055 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.7.0 > > Attachments: HDFS-7055.002.patch, HDFS-7055.003.patch, > HDFS-7055.004.patch, HDFS-7055.005.patch, screenshot-get-1mb.005.png, > screenshot-get-1mb.png > > > Add tracing to DFSInputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7198) Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope
Colin Patrick McCabe created HDFS-7198: -- Summary: Fix or suppress findbugs "unchecked conversion" warning in DFSClient#getPathTraceScope Key: HDFS-7198 URL: https://issues.apache.org/jira/browse/HDFS-7198 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Fix or suppress the findbugs "unchecked conversion" warning in {{DFSClient#getPathTraceScope}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160918#comment-14160918 ] Colin Patrick McCabe commented on HDFS-7194: bq. Could you check with Colin? I am not sure if he already has a plan fixing the findbugs and javac warnings. I think it's fine to fix the findbugs warning in this JIRA. I am +1 on the patch pending jenkins. > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7194) Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7194: --- Summary: Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH (was: Two findbugs issues in recent PreCommit-HDFS-Build builds) > Fix findbugs "inefficient new String constructor" warning in DFSClient#PATH > --- > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160906#comment-14160906 ] Allen Wittenauer commented on HDFS-7146: bq. If user name 123 That's not a legal Unix user name and most/all compliant useradd's will kick it back as invalid. FWIW, all sorts of problems happen with all numeric usernames if one tries to use them. For example, if one runs 'chown 123 file' what permissions would be on the file? It's perfectly reasonable for the system to fail in this scenario. bq. About "id" command I'm -1 on using id for this, even if it works on Linux and OS X. It limits any future portability to systems on SysV machines where /usr/bin/id is typically the SysV id and not POSIX id. We've been down this road before with id in the pre-security days. It was a problem then and it will be a problem in the future. (Never mind the fact that I suspect the code actually works on other operating systems, but we've artificially limited it for reasons which I'm unclear on.) tl;dr: So use getent on everything but OS X. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure
[ https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160907#comment-14160907 ] Hudson commented on HDFS-7169: -- FAILURE: Integrated in Hadoop-trunk-Commit #6199 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6199/]) HDFS-7169. Add SE_BAD_FIELD to findbugsExcludeFile.xml. (szetszwo: rev 3affad9ebd7def57eb3dd1cc1a1e806fceee63ad) * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix a findbugs warning in ReplaceDatanodeOnFailure > -- > > Key: HDFS-7169 > URL: https://issues.apache.org/jira/browse/HDFS-7169 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.6.0 > > Attachments: h7169_20140930.patch > > > The following findbugs warning came up recently although there was no recent > change of the code. > - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable > instance field condition > Bug type SE_BAD_FIELD (click for details) > In class > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > Field > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition > In ReplaceDatanodeOnFailure.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7194: Attachment: HDFS-7194.002.patch > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch, HDFS-7194.002.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7186) Add usage of "hadoop trace" command to doc
[ https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160900#comment-14160900 ] Colin Patrick McCabe commented on HDFS-7186: Looks good. {code} + $ hadoop trace -add -class org.htrace.impl.LocalFileSpanReceiver -Chadoop.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000 {code} Since the namespace of LocalFileSpanReceiver might be changing soon, I'd prefer to tell people to use {{hadoop trace -add -class LocalFileSpanReceiver}} (i.e., have the system automatically add the namespace). That way it will work even after we move {{LocalFileSpanReceiver}}. > Add usage of "hadoop trace" command to doc > -- > > Key: HDFS-7186 > URL: https://issues.apache.org/jira/browse/HDFS-7186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7186-0.patch > > > The command for tracing management was added in HDFS-6956. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160898#comment-14160898 ] Tsz Wo Nicholas Sze commented on HDFS-7194: --- Could you check with Colin? I am not sure if he already has a plan fixing the findbugs and javac warnings. > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure
[ https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7169: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Arpit for reviewing the patch. I have committed this. > Fix a findbugs warning in ReplaceDatanodeOnFailure > -- > > Key: HDFS-7169 > URL: https://issues.apache.org/jira/browse/HDFS-7169 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.6.0 > > Attachments: h7169_20140930.patch > > > The following findbugs warning came up recently although there was no recent > change of the code. > - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable > instance field condition > Bug type SE_BAD_FIELD (click for details) > In class > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > Field > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition > In ReplaceDatanodeOnFailure.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure
[ https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7169: -- Component/s: build > Fix a findbugs warning in ReplaceDatanodeOnFailure > -- > > Key: HDFS-7169 > URL: https://issues.apache.org/jira/browse/HDFS-7169 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.6.0 > > Attachments: h7169_20140930.patch > > > The following findbugs warning came up recently although there was no recent > change of the code. > - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable > instance field condition > Bug type SE_BAD_FIELD (click for details) > In class > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > Field > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition > In ReplaceDatanodeOnFailure.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160887#comment-14160887 ] Haohui Mai commented on HDFS-6994: -- Can the code be committed in a separate branch other than HADOOP-10388 so that it don't get mixed with the earlier effort? > libhdfs3 - A native C/C++ HDFS client > - > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160880#comment-14160880 ] Yongjun Zhang commented on HDFS-7194: - Hi [~szetszwo], Thanks for the info. I was not aware of HDFS-7169. We can dedicate HDFS-7194 for the other issue reported here then. I'm uploading a revision to drop the change for the HDFS-7169 issue. It's going to be a really trivial change, would you please help reviewing? Thanks a lot. > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7010) boot up libhdfs3 project
[ https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160885#comment-14160885 ] Haohui Mai commented on HDFS-7010: -- I wonder, why the code needs to unwind the stack? {code} +static const std::string SymbolizeAndDemangle(void * pc) { +std::vector buffer(1024); +std::ostringstream ss; +uint64_t pc0 = reinterpret_cast(pc); +uint64_t start_address = 0; +int object_fd = OpenObjectFileContainingPcAndGetStartAddress(pc0, +start_address); + +if (object_fd == -1) { +return DEFAULT_STACK_PREFIX"Unknown"; +} + +FileDescriptor wrapped_object_fd(object_fd); +int elf_type = FileGetElfType(wrapped_object_fd.get()); + +if (elf_type == -1) { +return DEFAULT_STACK_PREFIX"Unknown"; +} + +if (!GetSymbolFromObjectFile(wrapped_object_fd.get(), pc0, + &buffer[0], buffer.size(), start_address)) { +return DEFAULT_STACK_PREFIX"Unknown"; +} + +ss << DEFAULT_STACK_PREFIX << DemangleSymbol(&buffer[0]); +return ss.str(); +} + +#elif defined(OS_MACOSX) && defined(HAVE_DLADDR) + +static const std::string SymbolizeAndDemangle(void * pc) { +Dl_info info; +std::ostringstream ss; + +if (dladdr(pc, &info) && info.dli_sname) { +ss << DEFAULT_STACK_PREFIX << DemangleSymbol(info.dli_sname); +} else { +ss << DEFAULT_STACK_PREFIX << "Unknown"; +} + +return ss.str(); +} + +#endif + +const std::string PrintStack(int skip, int maxDepth) { +std::ostringstream ss; +std::vector stack; +GetStack(skip + 1, maxDepth, stack); + +for (size_t i = 0; i < stack.size(); ++i) { +ss << SymbolizeAndDemangle(stack[i]) << std::endl; +} + +return ss.str(); +} {code} > boot up libhdfs3 project > > > Key: HDFS-7010 > URL: https://issues.apache.org/jira/browse/HDFS-7010 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Colin Patrick McCabe > Attachments: HDFS-7010-pnative.003.patch, > HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch > > > boot up libhdfs3 project with CMake, Readme and license file. > Integrate google mock and google test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7187) DomainSocketWatcher thread crashes causing datanode to leak connection threads
[ https://issues.apache.org/jira/browse/HDFS-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160860#comment-14160860 ] Colin Patrick McCabe commented on HDFS-7187: That exception occurs while {{DomainSocketWatcher}} is exiting. It's a bug, but only a bug that affects shutdown (HADOOP-10404). To find out why it's shutting down, you need to look backwards in the log. > DomainSocketWatcher thread crashes causing datanode to leak connection threads > -- > > Key: HDFS-7187 > URL: https://issues.apache.org/jira/browse/HDFS-7187 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.3.0 >Reporter: Maxim Ivanov > > It seems that DomainSocketWatcher crashes, which makes all those short > circuit threads to wait forever: > {code} > Exception in thread "Thread-22" java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$ValueIterator.next(TreeMap.java:1160) > at > org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:465) > at java.lang.Thread.run(Thread.java:745) > {code} > In the meantime DataXceiver threads look like this (their number grows up to > connection threads limit): > {code} > "DataXceiver for client unix:/var/run/hadoop-hdfs/datanode50010.socket > [Waiting for operation #1]" daemon prio=10 tid=0x7fb3c14d3800 nid=0x997e > waiting on condition [0x7fb2a1d25000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000744d1d600> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:386) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7174) Support for more efficient large directories
[ https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160847#comment-14160847 ] Colin Patrick McCabe commented on HDFS-7174: This is a great idea, [~kihwal]. Definitely something we need. This reminds me of why [~tlipcon] created {{ChunkedArrayList}}. We found that block reports were generating too much garbage when they created their giant {{ArrayLists}}. We had the same problem described here where resizing a giant {{ArrayList}} required an enormous amount of copying, and made the previous giant array a giant piece of garbage (which could trigger a full GC). I was about to suggest using {{ChunkedArrayList}}, but I don't think it supports insertion into the middle of the list, unfortunately. It might not be too hard to extend {{ChunkedArrayList}} to support insertion into the middle, though... perhaps we should consider this. As [~hitliuyi] pointed out, the current patch has a problem. If we go back and forth between {{switchingThreshold}} (say, by repeatedly adding and removing a single element to a directory), we pay a very high cost. To prevent this, the threshold for converting a {{INodeHashedArrayList}} back to a simple {{INodeArrayList}} should be lower than the threshold for doing the opposite conversion. I also agree with [~jingzhao] that scaling could become a problem with the proposed scheme, since it only has a single level of partitioning. I guess the counter-argument here is that there won't be that many giant directories and this works for your needs. [~shv] wrote: I am probably late to the party, but for whatever it worth. Did you consider using balanced trees for inode lists, something like B-trees? B-trees would be an excellent solution here. Since they generally use arrays in the leaf nodes, this also gets you the benefits of tighter packing in memory. I guess the tricky part is writing the code. > Support for more efficient large directories > > > Key: HDFS-7174 > URL: https://issues.apache.org/jira/browse/HDFS-7174 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch > > > When the number of children under a directory grows very large, insertion > becomes very costly. E.g. creating 1M entries takes 10s of minutes. This is > because the complexity of an insertion is O\(n\). As the size of a list > grows, the overhead grows n^2. (integral of linear function). It also causes > allocations and copies of big arrays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160845#comment-14160845 ] Andrew Wang commented on HDFS-6102: --- Hey Ravi, we could probably up this to ~6.7mil, but it seems like you'd probably run into this limit soon enough too. Do you mind filing a new JIRA to chunk up large directories? That's the only future-proof fix. > Lower the default maximum items per directory to fix PB fsimage loading > --- > > Key: HDFS-6102 > URL: https://issues.apache.org/jira/browse/HDFS-6102 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Fix For: 2.4.0 > > Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch > > > Found by [~schu] during testing. We were creating a bunch of directories in a > single directory to blow up the fsimage size, and it ends up we hit this > error when trying to load a very large fsimage: > {noformat} > 2014-03-13 13:57:03,901 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 > INodes. > 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Failed to load image from > FSImageFile(file=/dfs/nn/current/fsimage_00024532742, > cpktTxId=00024532742) > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at > com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) > at > com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) > at > com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) > at 52) > ... > {noformat} > Some further research reveals there's a 64MB max size per PB message, which > seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure
[ https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160784#comment-14160784 ] Tsz Wo Nicholas Sze edited comment on HDFS-7169 at 10/6/14 7:28 PM: > -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) > warnings. The new findbugs warning was introduced by HDFS-7055. was (Author: szetszwo): > -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) > warnings. The new findbugs warning were introduced by HDFS-7055. > Fix a findbugs warning in ReplaceDatanodeOnFailure > -- > > Key: HDFS-7169 > URL: https://issues.apache.org/jira/browse/HDFS-7169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: h7169_20140930.patch > > > The following findbugs warning came up recently although there was no recent > change of the code. > - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable > instance field condition > Bug type SE_BAD_FIELD (click for details) > In class > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > Field > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition > In ReplaceDatanodeOnFailure.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160786#comment-14160786 ] Tsz Wo Nicholas Sze commented on HDFS-7194: --- We already have HDFS-7169 for ReplaceDatanodeOnFailure. The another findbugs warnings was introduced by HDFS-7055. It also introduced some javac warnings. > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7169) Fix a findbugs warning in ReplaceDatanodeOnFailure
[ https://issues.apache.org/jira/browse/HDFS-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160784#comment-14160784 ] Tsz Wo Nicholas Sze commented on HDFS-7169: --- > -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) > warnings. The new findbugs warning were introduced by HDFS-7055. > Fix a findbugs warning in ReplaceDatanodeOnFailure > -- > > Key: HDFS-7169 > URL: https://issues.apache.org/jira/browse/HDFS-7169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: h7169_20140930.patch > > > The following findbugs warning came up recently although there was no recent > change of the code. > - ReplaceDatanodeOnFailure$Policy defines non-transient non-serializable > instance field condition > Bug type SE_BAD_FIELD (click for details) > In class > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > Field > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy.condition > In ReplaceDatanodeOnFailure.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160758#comment-14160758 ] Tsz Wo Nicholas Sze commented on HDFS-7055: --- > The findbugs warnings are not related (they're for code issues that already > exist). One of the findbugs warnings was from the patch as shown in the [Jenkins report|https://builds.apache.org/job/PreCommit-HDFS-Build/8284/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html]. - Dmorg.apache.hadoop.hdfs.DFSClient.() invokes inefficient new String(String) constructor Bug type DM_STRING_CTOR (click for details) In class org.apache.hadoop.hdfs.DFSClient In method org.apache.hadoop.hdfs.DFSClient.() At DFSClient.java:\[line 3174] {code} private static final byte[] PATH = new String("path").getBytes(Charset.forName("UTF-8")); {code} > Meanwhile diffJavacWarnings.txt is missing, so I can't evaluate where there > is an additional warning or not. Have you checked the later builds? The [diffJavacWarnings.txt|https://builds.apache.org/job/PreCommit-HDFS-Build/8284/artifact/patchprocess/diffJavacWarnings.txt] file was available. The javac warnings were indeed from the patch. > Add tracing to DFSInputStream > - > > Key: HDFS-7055 > URL: https://issues.apache.org/jira/browse/HDFS-7055 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.7.0 > > Attachments: HDFS-7055.002.patch, HDFS-7055.003.patch, > HDFS-7055.004.patch, HDFS-7055.005.patch, screenshot-get-1mb.005.png, > screenshot-get-1mb.png > > > Add tracing to DFSInputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7184) Allow data migration tool to run as a daemon
[ https://issues.apache.org/jira/browse/HDFS-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-7184: --- Issue Type: Sub-task (was: Improvement) Parent: HDFS-7197 > Allow data migration tool to run as a daemon > > > Key: HDFS-7184 > URL: https://issues.apache.org/jira/browse/HDFS-7184 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover, scripts >Reporter: Benoy Antony >Assignee: Benoy Antony >Priority: Minor > Attachments: HDFS-7184.patch > > > Just like balancer, it is sometimes required to run data migration tool in a > daemon mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7197) Enhancements to Data Migration Tool
Benoy Antony created HDFS-7197: -- Summary: Enhancements to Data Migration Tool Key: HDFS-7197 URL: https://issues.apache.org/jira/browse/HDFS-7197 Project: Hadoop HDFS Issue Type: Improvement Reporter: Benoy Antony Assignee: Benoy Antony Data migration tool (mover) was added as part of HDFS-6584. We have been using Archival storage tier in our clusters. We have implemented a similar data migration tool (Mover) to migrate data to and from Archival storage. This is an umbrella jira to contribute the features and improvements identified based on our experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160748#comment-14160748 ] Ravi Prakash commented on HDFS-6102: This is preventing log aggregation for jobs of users who run very many jobs. e.g. in the NodeManager logs: {code}The directory item limit of //logs is exceeded: limit=1048576 items=2144288{code} > Lower the default maximum items per directory to fix PB fsimage loading > --- > > Key: HDFS-6102 > URL: https://issues.apache.org/jira/browse/HDFS-6102 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Fix For: 2.4.0 > > Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch > > > Found by [~schu] during testing. We were creating a bunch of directories in a > single directory to blow up the fsimage size, and it ends up we hit this > error when trying to load a very large fsimage: > {noformat} > 2014-03-13 13:57:03,901 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 > INodes. > 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Failed to load image from > FSImageFile(file=/dfs/nn/current/fsimage_00024532742, > cpktTxId=00024532742) > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at > com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) > at > com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) > at > com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) > at 52) > ... > {noformat} > Some further research reveals there's a 64MB max size per PB message, which > seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message
[ https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160699#comment-14160699 ] Yongjun Zhang commented on HDFS-3342: - The findbugs issues are not introduced by the change made here. I created a separate jira HDFS-7194 for them. > SocketTimeoutException in BlockSender.sendChunks could have a better error > message > -- > > Key: HDFS-3342 > URL: https://issues.apache.org/jira/browse/HDFS-3342 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Yongjun Zhang >Priority: Minor > Attachments: HDFS-3342.001.patch > > > Currently, if a client connects to a DN and begins to read a block, but then > stops calling read() for a long period of time, the DN will log a > SocketTimeoutException "48 millis timeout while waiting for channel to be > ready for write." This is because there is no "keepalive" functionality of > any kind. At a minimum, we should improve this error message to be an INFO > level log which just says that the client likely stopped reading, so > disconnecting it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7194: Status: Patch Available (was: Open) Submitted patch 001. Hi [~szetszwo], the change I made for the first issue reported is in on top of HDFS-4257 code. I see that currently we don't do serialization of ReplaceDatanodeOnFailure.Policy, so it should be safe to change the "condition" filed to transient to remove the findbugs warnding. Would you please comment in case I missed anything? thanks a lot. > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7194) Two findbugs issues in recent PreCommit-HDFS-Build builds
[ https://issues.apache.org/jira/browse/HDFS-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7194: Attachment: HDFS-7194.001.patch > Two findbugs issues in recent PreCommit-HDFS-Build builds > - > > Key: HDFS-7194 > URL: https://issues.apache.org/jira/browse/HDFS-7194 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7194.001.patch > > > In recent PreCommit-HDFS-Build, 8325, 8324, 8323 etc, there are findbugs > warnings introduced by earlier fixes. > E.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/8324//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html > {quote} > Bad practice Warnings > Code Warning > SeClass > org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy > defines non-transient non-serializable instance field condition > Performance Warnings > Code Warning > Dmorg.apache.hadoop.hdfs.DFSClient.() > invokes inefficient new String(String) constructor > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160678#comment-14160678 ] Yongjun Zhang commented on HDFS-7146: - Hi [~aw], Thanks for the info you provided. Here is what the comment says (man getent): {code} group When no key is provided, use setgrent(3), getgrent(3), and endgrent(3) to enumerate the group database. When one or more key arguments are provided, pass each numeric key to getgrgid(3) and each nonnumeric key to getgrnam(3) and display the result. passwdWhen no key is provided, use setpwent(3), getpwent(3), and endpwent(3) to enumerate the passwd database. When one or more key arguments are provided, pass each numeric key to getpwuid(3) and each nonnumeric key to getpwnam(3) and display the result. {code} If user name 123 has uid 456, and we do "getent passwd 123", it will think 123 is uid, and search for user with uid 123, which may not exist, this is when we get back nothing. About "id" command, I tested it on centos and mac (thanks for [~j...@cloudera.com]'s help), would you please comment whether it's good enough and what could be missed? The nfs code is said to support linux and mac only. Thanks. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160659#comment-14160659 ] Yongjun Zhang commented on HDFS-7146: - HI [~brandonli], thanks for your comments. I just uploaded rev 03. It works slightly different than what you described. 1. At initialization, the map is empty 2. Both users/groups/ids are added to the map on demand (e.g. when requested), 3. When groupId is requested for a given groupName, if the groupName is numerical, the full group map is loaded (this is lazy full list load I referred to ealier 4. Periodically update the cached maps for both user and group. What I do here is actually to clear the map. I imaged that some users and groups might be removed (for example, a user changed job), so I instead of loading anything, I cleared the map during this update, essentially reinitialize the map. And then steps 2 and 3 will be repeated I did not change the logic when to update the map. Would you please take a look again to see if the change makes sense to you? thanks a lot. > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7146: Attachment: HDFS-7146.003.patch > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch, > HDFS-7146.003.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end
[ https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160587#comment-14160587 ] Hadoop QA commented on HDFS-7128: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673098/HDFS-7128-2.patch against trunk revision ed841dd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8330//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8330//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8330//console This message is automatically generated. > Decommission slows way down when it gets towards the end > > > Key: HDFS-7128 > URL: https://issues.apache.org/jira/browse/HDFS-7128 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7128-2.patch, HDFS-7128.patch > > > When we decommission nodes across different racks, the decommission process > becomes really slow at the end, hardly making any progress. The problem is > some blocks are on 3 decomm-in-progress DNs and the way how replications are > scheduled caused unnecessary delay. Here is the analysis. > When BlockManager schedules the replication work from neededReplication, it > first needs to pick the source node for replication via chooseSourceDatanode. > The core policies to pick the source node are: > 1. Prefer decomm-in-progress node. > 2. Only pick the nodes whose outstanding replication counts are below > thresholds dfs.namenode.replication.max-streams or > dfs.namenode.replication.max-streams-hard-limit, based on the replication > priority. > When we decommission nodes, > 1. All the decommission nodes' blocks will be added to neededReplication. > 2. BM will pick X number of blocks from neededReplication in each iteration. > X is based on cluster size and some configurable multiplier. So if the > cluster has 2000 nodes, X will be around 4000. > 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up > being chosen as the source node of all these 4000 nodes. The reason the > outstanding replication thresholds don't kick is due to the implementation of > BlockManager.computeReplicationWorkForBlocks; > node.getNumberOfBlocksToBeReplicated() remains zero given > node.addBlockToBeReplicated is called after source node iteration. > {noformat} > ... > synchronized (neededReplications) { > for (int priority = 0; priority < blocksToReplicate.size(); > priority++) { > ... > chooseSourceDatanode > ... > } > for(ReplicationWork rw : work){ > ... > rw.srcNode.addBlockToBeReplicated(block, targets); > ... > } > {noformat} > > 4. So several decomm-in-progress nodes A, B, C end up with 4000 > node.getNumberOfBlocksToBeReplicated(). > 5. If we assume each node can replicate 5 blocks per minutes, it is going to > take 800 minutes to finish replication of these blocks. > 6. Pending replication timeout kick in after 5 minutes. The items will be > removed from the pending replication queue and added back to > neededReplication. The replications will then be handled by other source > nodes of these blocks. But the blocks still remain in nodes A, B, C's pending > replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue > the replications of these blocks, although these blocks might have been > replicated by other DNs after replication timeout. > 7. Some block' replicas exist on A, B, C and it is at the end of A's pend
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160511#comment-14160511 ] Milind Bhandarkar commented on HDFS-3107: - Dhruba, Indeed. Lack of concurrent writes to a single HDFS file means that there will be only a single outstanding transaction against a file (unless the concurrency is implemented at a higher level.) A database can consist of multiple files, though, and one can have multiple outstanding transactions against the database (one per file.) In either case, rollback is achieved by truncating the file to position prior to beginning of transaction. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7182) JMX metrics aren't accessible when NN is busy
[ https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160479#comment-14160479 ] Ming Ma commented on HDFS-7182: --- Thanks, Akira. Findbugs and failed unit tests aren't related. > JMX metrics aren't accessible when NN is busy > - > > Key: HDFS-7182 > URL: https://issues.apache.org/jira/browse/HDFS-7182 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7182.patch > > > HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple > new metrics have been added. It turns out "RollingUpgradeStatus" requires > FSNamesystem read lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7146) NFS ID/Group lookup requires SSSD enumeration on the server
[ https://issues.apache.org/jira/browse/HDFS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160410#comment-14160410 ] Allen Wittenauer commented on HDFS-7146: bq. For getent command, when it sees a numerical key, it thinks you are doing reverse lookup (from id to name). That's why it returns nothing. Something sounds broken if it is returning nothing. It should be able to map forward and reverse if things are working correctly. The problem with using id is that output isn't exactly portable. bq. Unfortunately there is no corresponding one for group. getent most definitely routes gid<->group mappings as well: Linux: {code} $ getent group 1 bin:x:1:bin,daemon $ getent group bin bin:x:1:bin,daemon {code} Solaris: {code} $ getent group 1 other::1:root $ getent group other other::1:root {code} > NFS ID/Group lookup requires SSSD enumeration on the server > --- > > Key: HDFS-7146 > URL: https://issues.apache.org/jira/browse/HDFS-7146 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-7146.001.patch, HDFS-7146.002.allIncremental.patch > > > The current implementation of the NFS UID and GID lookup works by running > 'getent passwd' with an assumption that it will return the entire list of > users available on the OS, local and remote (AD/etc.). > This behaviour of the command is advised to be and is prevented by > administrators in most secure setups to avoid excessive load to the ADs > involved, as the # of users to be listed may be too large, and the repeated > requests of ALL users not present in the cache would be too much for the AD > infrastructure to bear. > The NFS server should likely do lookups based on a specific UID request, via > 'getent passwd ', if the UID does not match a cached value. This reduces > load on the LDAP backed infrastructure. > Thanks [~qwertymaniac] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160381#comment-14160381 ] Hudson commented on HDFS-6995: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1918 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1918/]) HDFS-6995. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available (vinayakumarb) (vinayakumarb: rev ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Block should be placed in the client's 'rack-local' node if 'client-local' > node is not available > > > Key: HDFS-6995 > URL: https://issues.apache.org/jira/browse/HDFS-6995 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, > HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, > HDFS-6995-006.patch, HDFS-6995-007.patch > > > HDFS cluster is rack aware. > Client is in different node than of datanode, > but Same rack contains one or more datanodes. > In this case first preference should be given to select 'rack-local' node. > Currently, since no Node in clusterMap corresponds to client's location, > blockplacement policy choosing a *random* node as local node and proceeding > for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7193) value of "dfs.webhdfs.enabled" in user doc is incorrect.
[ https://issues.apache.org/jira/browse/HDFS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160363#comment-14160363 ] Haohui Mai commented on HDFS-7193: -- [~hitliuyi], actually, since it is already available at hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm, can you please update the patch to delete the entry in hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm? > value of "dfs.webhdfs.enabled" in user doc is incorrect. > > > Key: HDFS-7193 > URL: https://issues.apache.org/jira/browse/HDFS-7193 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, webhdfs >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Trivial > Attachments: HDFS-7193.001.patch, HDFS-7193.002.patch > > > The default value for {{dfs.webhdfs.enabled}} should be {{true}}, not > _http/_HOST@REALM.TLD_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7128) Decommission slows way down when it gets towards the end
[ https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7128: -- Attachment: HDFS-7128-2.patch Here is the patch with unit test. We tested it on some large cluster. We decommed 10 nodes per rack from two racks. Without the patch, it takes 174 minutes to finish block replication. With the patch, it takes 82 minutes to finish block replication. > Decommission slows way down when it gets towards the end > > > Key: HDFS-7128 > URL: https://issues.apache.org/jira/browse/HDFS-7128 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7128-2.patch, HDFS-7128.patch > > > When we decommission nodes across different racks, the decommission process > becomes really slow at the end, hardly making any progress. The problem is > some blocks are on 3 decomm-in-progress DNs and the way how replications are > scheduled caused unnecessary delay. Here is the analysis. > When BlockManager schedules the replication work from neededReplication, it > first needs to pick the source node for replication via chooseSourceDatanode. > The core policies to pick the source node are: > 1. Prefer decomm-in-progress node. > 2. Only pick the nodes whose outstanding replication counts are below > thresholds dfs.namenode.replication.max-streams or > dfs.namenode.replication.max-streams-hard-limit, based on the replication > priority. > When we decommission nodes, > 1. All the decommission nodes' blocks will be added to neededReplication. > 2. BM will pick X number of blocks from neededReplication in each iteration. > X is based on cluster size and some configurable multiplier. So if the > cluster has 2000 nodes, X will be around 4000. > 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up > being chosen as the source node of all these 4000 nodes. The reason the > outstanding replication thresholds don't kick is due to the implementation of > BlockManager.computeReplicationWorkForBlocks; > node.getNumberOfBlocksToBeReplicated() remains zero given > node.addBlockToBeReplicated is called after source node iteration. > {noformat} > ... > synchronized (neededReplications) { > for (int priority = 0; priority < blocksToReplicate.size(); > priority++) { > ... > chooseSourceDatanode > ... > } > for(ReplicationWork rw : work){ > ... > rw.srcNode.addBlockToBeReplicated(block, targets); > ... > } > {noformat} > > 4. So several decomm-in-progress nodes A, B, C end up with 4000 > node.getNumberOfBlocksToBeReplicated(). > 5. If we assume each node can replicate 5 blocks per minutes, it is going to > take 800 minutes to finish replication of these blocks. > 6. Pending replication timeout kick in after 5 minutes. The items will be > removed from the pending replication queue and added back to > neededReplication. The replications will then be handled by other source > nodes of these blocks. But the blocks still remain in nodes A, B, C's pending > replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue > the replications of these blocks, although these blocks might have been > replicated by other DNs after replication timeout. > 7. Some block' replicas exist on A, B, C and it is at the end of A's pending > replication queue. Even though the block's replication timeout, no source > node can be chosen given A, B, C all have high pending replication count. So > we have to wait until A drains its pending replication queue. Meanwhile, the > items in A's pending replication queue have been taken care of by other nodes > and no longer under replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160353#comment-14160353 ] Ming Ma commented on HDFS-7009: --- Findbugs and failed unit tests aren't related. > Active NN and standby NN have different live nodes > -- > > Key: HDFS-7009 > URL: https://issues.apache.org/jira/browse/HDFS-7009 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7009-2.patch, HDFS-7009.patch > > > To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most > cases, given DN sends HB and BR to NN regularly, if a specific RPC call > fails, it isn't a big deal. > However, there are cases where DN fails to register with NN during initial > handshake due to exceptions not covered by RPC client's connection retry. > When this happens, the DN won't talk to that NN until the DN restarts. > {noformat} > BPServiceActor > public void run() { > LOG.info(this + " starting to offer service"); > try { > // init stuff > try { > // setup storage > connectToNNAndHandshake(); > } catch (IOException ioe) { > // Initial handshake, storage recovery or registration failed > // End BPOfferService thread > LOG.fatal("Initialization failed for block pool " + this, ioe); > return; > } > initialized = true; // bp is initialized; > > while (shouldRun()) { > try { > offerService(); > } catch (Exception ex) { > LOG.error("Exception in BPOfferService for " + this, ex); > sleepAndLogInterrupts(5000, "offering service"); > } > } > ... > {noformat} > Here is an example of the call stack. > {noformat} > java.io.IOException: Failed on local exception: java.io.IOException: Response > is null.; Host Details : local host is: "xxx"; destination host is: > "yyy":8030; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) > at org.apache.hadoop.ipc.Client.call(Client.java:1239) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Response is null. > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) > {noformat} > This will create discrepancy between active NN and standby NN in terms of > live nodes. > > Here is a possible scenario of missing blocks after failover. > 1. DN A, B set up handshakes with active NN, but not with standby NN. > 2. A block is replicated to DN A, B and C. > 3. From standby NN's point of view, given A and B are dead nodes, the block > is under replicated. > 4. DN C is down. > 5. Before active NN detects DN C is down, it fails over. > 6. The new active NN considers the block is missing. Even though there are > two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160327#comment-14160327 ] Hudson commented on HDFS-6995: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1893 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1893/]) HDFS-6995. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available (vinayakumarb) (vinayakumarb: rev ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > Block should be placed in the client's 'rack-local' node if 'client-local' > node is not available > > > Key: HDFS-6995 > URL: https://issues.apache.org/jira/browse/HDFS-6995 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, > HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, > HDFS-6995-006.patch, HDFS-6995-007.patch > > > HDFS cluster is rack aware. > Client is in different node than of datanode, > but Same rack contains one or more datanodes. > In this case first preference should be given to select 'rack-local' node. > Currently, since no Node in clusterMap corresponds to client's location, > blockplacement policy choosing a *random* node as local node and proceeding > for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160214#comment-14160214 ] Hudson commented on HDFS-6995: -- FAILURE: Integrated in Hadoop-Yarn-trunk #703 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/703/]) HDFS-6995. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available (vinayakumarb) (vinayakumarb: rev ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Block should be placed in the client's 'rack-local' node if 'client-local' > node is not available > > > Key: HDFS-6995 > URL: https://issues.apache.org/jira/browse/HDFS-6995 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, > HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, > HDFS-6995-006.patch, HDFS-6995-007.patch > > > HDFS cluster is rack aware. > Client is in different node than of datanode, > but Same rack contains one or more datanodes. > In this case first preference should be given to select 'rack-local' node. > Currently, since no Node in clusterMap corresponds to client's location, > blockplacement policy choosing a *random* node as local node and proceeding > for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HDFS-7196) Fix several issues of hadoop security configuration in user doc.
[ https://issues.apache.org/jira/browse/HDFS-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu moved HADOOP-11164 to HDFS-7196: --- Component/s: (was: security) (was: documentation) security documentation Target Version/s: 2.7.0 (was: 2.7.0) Key: HDFS-7196 (was: HADOOP-11164) Project: Hadoop HDFS (was: Hadoop Common) > Fix several issues of hadoop security configuration in user doc. > > > Key: HDFS-7196 > URL: https://issues.apache.org/jira/browse/HDFS-7196 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, security >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Trivial > > There are several issues of secure mode in user doc: > {{dfs.namenode.secondary.keytab.file}} should be > {{dfs.secondary.namenode.keytab.file}}, > {{dfs.namenode.secondary.kerberos.principal}} should be > {{dfs.secondary.namenode.kerberos.principal}}. > {{dfs.namenode.kerberos.https.principal}} doesn't exist, it should be > {{dfs.namenode.kerberos.internal.spnego.principal}}. > {{dfs.namenode.secondary.kerberos.https.principal}} doesn't exist, it should > be {{dfs.secondary.namenode.kerberos.internal.spnego.principal}}. > {{dfs.datanode.kerberos.https.principal}} doesn't exist, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7195) Update user doc of secure mode about Datanodes don't require root or jsvc
Yi Liu created HDFS-7195: Summary: Update user doc of secure mode about Datanodes don't require root or jsvc Key: HDFS-7195 URL: https://issues.apache.org/jira/browse/HDFS-7195 Project: Hadoop HDFS Issue Type: Task Components: security Reporter: Yi Liu Assignee: Yi Liu HDFS-2856 adds support that Datanodes don't require root or jsvc. If {{dfs.data.transfer.protection}} is configured and {{dfs.http.policy}} is _HTTPS_ONLY_, then secure dataNode doesn't need to use privileged port. This has not been updated in the latest user doc of secure mode. This JIRA is to fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7186) Add usage of "hadoop trace" command to doc
[ https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160153#comment-14160153 ] Hadoop QA commented on HDFS-7186: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673070/HDFS-7186-0.patch against trunk revision ed841dd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8329//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8329//console This message is automatically generated. > Add usage of "hadoop trace" command to doc > -- > > Key: HDFS-7186 > URL: https://issues.apache.org/jira/browse/HDFS-7186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7186-0.patch > > > The command for tracing management was added in HDFS-6956. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160126#comment-14160126 ] Hudson commented on HDFS-6995: -- FAILURE: Integrated in Hadoop-trunk-Commit #6196 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6196/]) HDFS-6995. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available (vinayakumarb) (vinayakumarb: rev ed841dd9a96e54cb84d9cae5507e47ff1c8cdf6e) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Block should be placed in the client's 'rack-local' node if 'client-local' > node is not available > > > Key: HDFS-6995 > URL: https://issues.apache.org/jira/browse/HDFS-6995 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, > HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, > HDFS-6995-006.patch, HDFS-6995-007.patch > > > HDFS cluster is rack aware. > Client is in different node than of datanode, > but Same rack contains one or more datanodes. > In this case first preference should be given to select 'rack-local' node. > Currently, since no Node in clusterMap corresponds to client's location, > blockplacement policy choosing a *random* node as local node and proceeding > for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7186) Add usage of "hadoop trace" command to doc
[ https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-7186: --- Attachment: HDFS-7186-0.patch This patch includes the trivial fixes to follow the change in HDFS-7055. > Add usage of "hadoop trace" command to doc > -- > > Key: HDFS-7186 > URL: https://issues.apache.org/jira/browse/HDFS-7186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7186-0.patch > > > The command for tracing management was added in HDFS-6956. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7186) Add usage of "hadoop trace" command to doc
[ https://issues.apache.org/jira/browse/HDFS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-7186: --- Status: Patch Available (was: Open) > Add usage of "hadoop trace" command to doc > -- > > Key: HDFS-7186 > URL: https://issues.apache.org/jira/browse/HDFS-7186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: HDFS-7186-0.patch > > > The command for tracing management was added in HDFS-6956. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6995: Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) committed to trunk and branch-2. Thanks [~umamaheswararao] and [~hitliuyi] for the reviews > Block should be placed in the client's 'rack-local' node if 'client-local' > node is not available > > > Key: HDFS-6995 > URL: https://issues.apache.org/jira/browse/HDFS-6995 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Fix For: 2.6.0 > > Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, > HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, > HDFS-6995-006.patch, HDFS-6995-007.patch > > > HDFS cluster is rack aware. > Client is in different node than of datanode, > but Same rack contains one or more datanodes. > In this case first preference should be given to select 'rack-local' node. > Currently, since no Node in clusterMap corresponds to client's location, > blockplacement policy choosing a *random* node as local node and proceeding > for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)