[jira] [Commented] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
[ https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542817#comment-14542817 ] Colin Patrick McCabe commented on HDFS-8380: committed, thanks Always call addStoredBlock on blocks which have been shifted from one storage to another Key: HDFS-8380 URL: https://issues.apache.org/jira/browse/HDFS-8380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-8380.001.patch We should always call addStoredBlock on blocks which have been shifted from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
[ https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8380: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Always call addStoredBlock on blocks which have been shifted from one storage to another Key: HDFS-8380 URL: https://issues.apache.org/jira/browse/HDFS-8380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-8380.001.patch We should always call addStoredBlock on blocks which have been shifted from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540492#comment-14540492 ] Colin Patrick McCabe commented on HDFS-7240: This looks like a really interesting way to achieve a scalable blob store using some of the infrastructure we already have in HDFS. It could be a good direction for the project to go in. We should have a meeting to review the design and talk about how it fits in with the rest of what's going on in HDFS-land. Perhaps we could have a webex on the week of May 25th or June 1? (I am going to be out of town next week so I can't do next week.) Object store in HDFS Key: HDFS-7240 URL: https://issues.apache.org/jira/browse/HDFS-7240 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Ozone-architecture-v1.pdf This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7900) ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor
[ https://issues.apache.org/jira/browse/HDFS-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-7900. Resolution: Duplicate ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor - Key: HDFS-7900 URL: https://issues.apache.org/jira/browse/HDFS-7900 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Environment: hadoop cdh5 2.3.0 hbase 0.98 Reporter: zhangshilong Priority: Critical I delete some hbase's files manually or use rm -rf blk_ to delete the blockfile directly, but hbase keeps the file descriptor very long time. I found these file descriptor may be kept in shortcircuitcache replicaMap, but could not find when the file descriptor will be removed. replicaMap has no limits size for putting. run: lsof -p pid |grep deleted part of result: lk_1102309377_28571078.meta (deleted) java8430 hbase 8537r REG 8,145 536870912 806553760 /search/hadoop08/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir61/blk_1102541663 (deleted) java8430 hbase 8540r REG 8,113 4194311 812434001 /search/hadoop06/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir62/subdir21/blk_1102524193_28785917.meta (deleted) java8430 hbase 8541r REG 8,65 536870912 813718517 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618 (deleted) java8430 hbase 8542r REG 8,65 4194311 813718518 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618_28785342.meta (deleted) java8430 hbase 8543r REG 8,193 536870912 1886733815 /search/hadoop12/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir20/subdir22/blk_1102533549 (deleted) java8430 hbase 8544r REG 8,65 4194311 814828988 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir49/blk_1102676585_28938309.meta (deleted) java8430 hbase 8545r REG 8,17 4194311 812962137 /search/hadoop10/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir53/blk_1102597493_28859217.meta (deleted) java8430 hbase 8546r REG 8,97 4194311 810468992 /search/hadoop05/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir4/subdir46/blk_1102524567_28786291.meta (deleted) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7900) ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor
[ https://issues.apache.org/jira/browse/HDFS-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540540#comment-14540540 ] Colin Patrick McCabe commented on HDFS-7900: Since Hadoop 2.3 was released, there have been some improvements. The short-circuit code now uses its shared memory segment IPC to tell the client when a file descriptor has been invalidated. One reason for invalidation is because the block file has been deleted I am going to close this as a duplicate. Please reopen if you have any issues with the HDFS-6750 code. ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor - Key: HDFS-7900 URL: https://issues.apache.org/jira/browse/HDFS-7900 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Environment: hadoop cdh5 2.3.0 hbase 0.98 Reporter: zhangshilong Priority: Critical I delete some hbase's files manually or use rm -rf blk_ to delete the blockfile directly, but hbase keeps the file descriptor very long time. I found these file descriptor may be kept in shortcircuitcache replicaMap, but could not find when the file descriptor will be removed. replicaMap has no limits size for putting. run: lsof -p pid |grep deleted part of result: lk_1102309377_28571078.meta (deleted) java8430 hbase 8537r REG 8,145 536870912 806553760 /search/hadoop08/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir61/blk_1102541663 (deleted) java8430 hbase 8540r REG 8,113 4194311 812434001 /search/hadoop06/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir62/subdir21/blk_1102524193_28785917.meta (deleted) java8430 hbase 8541r REG 8,65 536870912 813718517 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618 (deleted) java8430 hbase 8542r REG 8,65 4194311 813718518 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618_28785342.meta (deleted) java8430 hbase 8543r REG 8,193 536870912 1886733815 /search/hadoop12/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir20/subdir22/blk_1102533549 (deleted) java8430 hbase 8544r REG 8,65 4194311 814828988 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir49/blk_1102676585_28938309.meta (deleted) java8430 hbase 8545r REG 8,17 4194311 812962137 /search/hadoop10/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir53/blk_1102597493_28859217.meta (deleted) java8430 hbase 8546r REG 8,97 4194311 810468992 /search/hadoop05/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir4/subdir46/blk_1102524567_28786291.meta (deleted) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8358) TestTraceAdmin fails
[ https://issues.apache.org/jira/browse/HDFS-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540743#comment-14540743 ] Colin Patrick McCabe commented on HDFS-8358: bq. I would like to make TraceAdmin able to handle other config prefix such as yarn.htrace.. I think SpanReceiverHost#addSpanReceiver is the right place to deal with the configs without prefix. Anyway we should file a new JIRA for that. Yeah. The user should be able to set -Clocal-file-span-receiver.path=/tmp/foo on the YARN daemons and have it apply the relevant YARN htrace properties. There isn't any need to require the user to send '-Cyarn.htrace.local-file-span-receciver.path=...' We know that YARN tracing is what we want to configure, by virtue of the fact that the \-host argument was for a YARN host. bq. The failure of TestHdfsConfigFields will be fixed in HDFS-8371. It should be addressed as separate JIRA and I will update the patch once HDFS-8371 is committed. OK. +1 for v3, can you file a follow-on JIRA to talk about the prefix issue? TestTraceAdmin fails Key: HDFS-8358 URL: https://issues.apache.org/jira/browse/HDFS-8358 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Masatake Iwasaki Attachments: HADOOP-11940.001.patch, HDFS-8358.002.patch, HDFS-8358.003.patch After HADOOP-11912, {{TestTraceAdmin#testCreateAndDestroySpanReceiver}} in hdfs started failing. It was probably unnoticed because the jira changed and triggered unit testing in common only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
Colin Patrick McCabe created HDFS-8380: -- Summary: Always call addStoredBlock on blocks which have been shifted from one storage to another Key: HDFS-8380 URL: https://issues.apache.org/jira/browse/HDFS-8380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe We should always call addStoredBlock on blocks which have been shifted from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
[ https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8380: --- Status: Patch Available (was: Open) Always call addStoredBlock on blocks which have been shifted from one storage to another Key: HDFS-8380 URL: https://issues.apache.org/jira/browse/HDFS-8380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8380.001.patch We should always call addStoredBlock on blocks which have been shifted from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
[ https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8380: --- Attachment: HDFS-8380.001.patch Always call addStoredBlock on blocks which have been shifted from one storage to another Key: HDFS-8380 URL: https://issues.apache.org/jira/browse/HDFS-8380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8380.001.patch We should always call addStoredBlock on blocks which have been shifted from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another
[ https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541061#comment-14541061 ] Colin Patrick McCabe commented on HDFS-8380: Background: HDFS-6830 attempted to implement block shifting logic, whereby when the NameNode received a report about some replica saying it was in some DataNode storage, it would update the NN's internal data structures to reflect the fact that this replica was not in any other storages on that DataNode. The assumption was (and still is) that each replica is present in at most one storage on each DN (an assumption we might want to revisit at some point, but that's outside the scope of this JIRA...). HDFS-6830 was flawed, however. Although it changed {{BlockManager#addBlock}} to update the storage which a particular block was in, it would not actually call {{BlockManager#addBlock}} on blocks it received in the full block report, if it had already seen their IDs. So in the case where blocks were moved between storages, HDFS-6830 would not actually update the internal data structures on the NameNode... they would remain in the old storages. HDFS-6991, although it would appear to be unrelated based on the title, actually has a partial fix for the bug in HDFS-6830, in the form of this code: {code} - (!storedBlock.findDatanode(dn) -|| corruptReplicas.isReplicaCorrupt(storedBlock, dn))) { + (storedBlock.findStorageInfo(storageInfo) == -1 || +corruptReplicas.isReplicaCorrupt(storedBlock, dn))) { addBlock(...) {code} However, HDFS-6991 doesn't fix the issue for RBW blocks. Admittedly, it is much less likely for RBW blocks to be shifted between storages, because when restarting a datanode, the RBW replicas become RWR. However, for the sake of robustness, we should implement the shifting behavior there too. This patch does that. It also adds logging for the first time we receive a storage report for a given storage. This should happen only once per storage, so it won't generate too many logs. It will be useful for tracing what is going on. It also adds debug logs to the initial storage report, similar to the debug logs available for the non-initial storage report. Finally, it adds a unit test for the shifting behavior. The unit test tests shifting of finalized blocks rather than RBW ones, so it doesn't require the rest of the patch to pass, but it's still very useful for preventing regressions. Always call addStoredBlock on blocks which have been shifted from one storage to another Key: HDFS-8380 URL: https://issues.apache.org/jira/browse/HDFS-8380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8380.001.patch We should always call addStoredBlock on blocks which have been shifted from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8311) DataStreamer.transfer() should timeout the socket InputStream.
[ https://issues.apache.org/jira/browse/HDFS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535272#comment-14535272 ] Colin Patrick McCabe commented on HDFS-8311: Good catch, [~yzhangal]. We should fix those other cases as well. I think we should do those in separate JIRAs, if that's more convenient for you. Also, it would be nice to have unit tests for these timeouts at some point, to ensure that they don't get removed. +1 again for the patch. Thanks, guys. DataStreamer.transfer() should timeout the socket InputStream. -- Key: HDFS-8311 URL: https://issues.apache.org/jira/browse/HDFS-8311 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: BB2015-05-TBR Attachments: 0001-HDFS-8311-DataStreamer.transfer-should-timeout-the-s.patch, HDFS-8311.001.patch While validating some HA failure modes we found that HDFS clients can take a long time to recover or sometimes don't recover at all since we don't setup the socket timeout in the InputStream: {code} private void transfer () { ... ... OutputStream unbufOut = NetUtils.getOutputStream(sock, writeTimeout); InputStream unbufIn = NetUtils.getInputStream(sock); ... } {code} The InputStream should have its own timeout in the same way as the OutputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8284) Update documentation about how to use HTrace with HDFS
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8284: --- Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) Update documentation about how to use HTrace with HDFS -- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, HDFS-8284.003.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535334#comment-14535334 ] Colin Patrick McCabe commented on HDFS-8284: +1. Thanks, [~iwasakims]. Add usage of tracing originated in DFSClient to doc --- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Labels: BB2015-05-TBR Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, HDFS-8284.003.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8284) Update documentation about how to use HTrace with HDFS
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8284: --- Summary: Update documentation about how to use HTrace with HDFS (was: Add usage of tracing originated in DFSClient to doc) Update documentation about how to use HTrace with HDFS -- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Labels: BB2015-05-TBR Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, HDFS-8284.003.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535487#comment-14535487 ] Colin Patrick McCabe commented on HDFS-8113: bq. Colin Patrick McCabe Would you mind committing this? Sure. Will commit now. It is a good robustness improvement. If we find more information about why the {{BlockInfoContinguous}} was added to the {{BlocksMap}} without a {{BlockCollection}}, we can file a separate JIRA for that. Thanks, guys. NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0, 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Labels: BB2015-05-TBR Attachments: HDFS-8113.02.patch, HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8113) Add check for null BlockCollection pointers in BlockInfoContiguous structures
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8113: --- Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) Add check for null BlockCollection pointers in BlockInfoContiguous structures - Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0, 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-8113.02.patch, HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8113) Add check for null BlockCollection pointers in BlockInfoContiguous structures
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8113: --- Summary: Add check for null BlockCollection pointers in BlockInfoContiguous structures (was: NullPointerException in BlockInfoContiguous causes block report failure) Add check for null BlockCollection pointers in BlockInfoContiguous structures - Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0, 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Labels: BB2015-05-TBR Attachments: HDFS-8113.02.patch, HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535502#comment-14535502 ] Colin Patrick McCabe commented on HDFS-8246: From the C client, if you wanted to know what files block ID 123 was in, you could do {{hdfsListDirectory(fs, path=/.reserved/.blockIdToFiles/123, ...)}}. I think one of the advantages of having a path in .reserved instead of a new API is everything just works for the C client, C++ client, webhdfs, etc. etc. Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: feng xu Labels: BB2015-05-TBR Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8358) TestTraceAdmin fails
[ https://issues.apache.org/jira/browse/HDFS-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535625#comment-14535625 ] Colin Patrick McCabe commented on HDFS-8358: Thanks for finding this and filing the JIRA, [~kihwal]. Intuitively it seems like I should be able to set \-Clocal-file-span-receiver.path=/tmp/foo, not \-Cdfs.htrace.local-file-span-receiver.path=/tmp/foo. We always want to be modifying the {{dfs.htrace}} config keys with the {{\-C}} options we pass, right? So maybe let's just prefix anything we get via {{\-C}} with {{dfs.htrace}} to avoid the extra typing. TestTraceAdmin fails Key: HDFS-8358 URL: https://issues.apache.org/jira/browse/HDFS-8358 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Masatake Iwasaki Attachments: HADOOP-11940.001.patch After HADOOP-11912, {{TestTraceAdmin#testCreateAndDestroySpanReceiver}} in hdfs started failing. It was probably unnoticed because the jira changed and triggered unit testing in common only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7847: --- Resolution: Fixed Fix Version/s: (was: HDFS-7836) 2.8.0 Status: Resolved (was: Patch Available) Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: 2.8.0 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, HDFS-7847.005.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8311) DataStreamer.transfer() should timeout the socket InputStream.
[ https://issues.apache.org/jira/browse/HDFS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529027#comment-14529027 ] Colin Patrick McCabe commented on HDFS-8311: Thanks, [~esteban]. +1 pending jenkins DataStreamer.transfer() should timeout the socket InputStream. -- Key: HDFS-8311 URL: https://issues.apache.org/jira/browse/HDFS-8311 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Attachments: 0001-HDFS-8311-DataStreamer.transfer-should-timeout-the-s.patch, HDFS-8311.001.patch While validating some HA failure modes we found that HDFS clients can take a long time to recover or sometimes don't recover at all since we don't setup the socket timeout in the InputStream: {code} private void transfer () { ... ... OutputStream unbufOut = NetUtils.getOutputStream(sock, writeTimeout); InputStream unbufIn = NetUtils.getInputStream(sock); ... } {code} The InputStream should have its own timeout in the same way as the OutputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Resolution: Fixed Fix Version/s: 2.7.1 Target Version/s: 2.7.1 (was: 2.8.0) Status: Resolved (was: Patch Available) HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.1 Attachments: HDFS-8305.001.patch, HDFS-8305.002.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. Previously, in some cases when using the old rename, this was not the case. The format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS always use the latter form. This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name. This is a compatible change since we aren't removing the ability to handle the first form during edit log replay... we just no longer generate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8311) DataStreamer.transfer() should timeout the socket InputStream.
[ https://issues.apache.org/jira/browse/HDFS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8311: --- Target Version/s: 2.8.0 Status: Patch Available (was: Open) DataStreamer.transfer() should timeout the socket InputStream. -- Key: HDFS-8311 URL: https://issues.apache.org/jira/browse/HDFS-8311 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Attachments: 0001-HDFS-8311-DataStreamer.transfer-should-timeout-the-s.patch, HDFS-8311.001.patch While validating some HA failure modes we found that HDFS clients can take a long time to recover or sometimes don't recover at all since we don't setup the socket timeout in the InputStream: {code} private void transfer () { ... ... OutputStream unbufOut = NetUtils.getOutputStream(sock, writeTimeout); InputStream unbufIn = NetUtils.getInputStream(sock); ... } {code} The InputStream should have its own timeout in the same way as the OutputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529094#comment-14529094 ] Colin Patrick McCabe commented on HDFS-8284: Thanks, [~iwasakims]. This looks really good. The only thing I would suggest is that we should remove the section or two on zipkin stuff. Instead, we should link to the upstream HTrace documentation about setting up span receivers. Having it in there makes people think that the only way to use htrace is through zipkin, which is certainly not true. Add usage of tracing originated in DFSClient to doc --- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529256#comment-14529256 ] Colin Patrick McCabe commented on HDFS-8246: How about having {{/.reserved/.blockIdToFiles/$ID}} map to a directory containing the hdfs files which have the given block ID? I think this would be a lot better than having a whole other set of APIs. Remember also that multiple snapshotted files can contain the same block ID. Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: feng xu Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled
[ https://issues.apache.org/jira/browse/HDFS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529197#comment-14529197 ] Colin Patrick McCabe commented on HDFS-8271: An uber jira is a good idea. Also, let's not change default behavior by binding on ipv6 by default. It will create problems for sure. NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled Key: HDFS-8271 URL: https://issues.apache.org/jira/browse/HDFS-8271 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 NameNode works properly on IPv4 or IPv6 single stack (assuming in the latter case that scripts have been changed to disable preferIPv4Stack, and dependent on the client/data node fix in HDFS-8078). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses being set.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528998#comment-14528998 ] Colin Patrick McCabe commented on HDFS-7847: +1. Thanks, [~clamb]. Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, HDFS-7847.005.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528910#comment-14528910 ] Colin Patrick McCabe commented on HDFS-7758: +1. Thanks, Eddy. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch, HDFS-7758.007.patch, HDFS-7758.008.patch, HDFS-7758.010.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7758: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.8.0 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch, HDFS-7758.007.patch, HDFS-7758.008.patch, HDFS-7758.010.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8157) Writes to RAM DISK reserve locked memory for block files
[ https://issues.apache.org/jira/browse/HDFS-8157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529275#comment-14529275 ] Colin Patrick McCabe commented on HDFS-8157: Thanks for this, [~arpitagarwal]. I don't think we should add {{DataNode#skipNativeIoCheckForTesting}}. To simulate locking memory without adding a dependency on NativeIO, then just create a custom cache manipulator. This custom manipulator can always return true for {{verifyCanMlock}}. There are some other unit tests doing this. {code} public void releaseReservedSpace(long bytesToRelease, boolean releaseLockedMemory); {code} I would rather have a separate function for releasing the memory than overload the meaning of this one. Maybe I am missing something, but I don't understand the purpose behind {{releaseRoundDown}}. Why would we round down to a page size when allocating or releasing memory? Writes to RAM DISK reserve locked memory for block files Key: HDFS-8157 URL: https://issues.apache.org/jira/browse/HDFS-8157 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8157.01.patch Per discussion on HDFS-6919, the first step is that writes to RAM disk will reserve locked memory via the FsDatasetCache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Description: HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. Previously, in some cases when using the old rename, this was not the case. The format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS always use the latter form. This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name. This is a compatible change since we aren't removing the ability to handle the first form during edit log replay... we just no longer generate it. (was: HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name.) HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. Previously, in some cases when using the old rename, this was not the case. The format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS always use the latter form. This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name. This is a compatible change since we aren't removing the ability to handle the first form during edit log replay... we just no longer generate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527023#comment-14527023 ] Colin Patrick McCabe commented on HDFS-8305: bq. can we add a description to this jira explaining why (e.g., This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name.)? added bq. can we add java doc to the void logRename( methods to say something like if the rename source is a file, the target is better to be a file too, this will ensure that inotify will always be able to confider the dst file as the full destination file name.? ok HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. Previously, in some cases when using the old rename, this was not the case. The format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS always use the latter form. This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name. This is a compatible change since we aren't removing the ability to handle the first form during edit log replay... we just no longer generate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Attachment: HDFS-8305.002.patch HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch, HDFS-8305.002.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. Previously, in some cases when using the old rename, this was not the case. The format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS always use the latter form. This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name. This is a compatible change since we aren't removing the ability to handle the first form during edit log replay... we just no longer generate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7847: --- Assignee: Charles Lamb (was: Colin Patrick McCabe) Status: Patch Available (was: Open) Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7847: --- Issue Type: Bug (was: Sub-task) Parent: (was: HDFS-7836) Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527525#comment-14527525 ] Colin Patrick McCabe commented on HDFS-7847: [~clamb], can you rebase this on trunk? Looks like it's gotten stale Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc
[ https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527214#comment-14527214 ] Colin Patrick McCabe commented on HDFS-8284: Thanks, [~iwasakims]. {code} 2328property 2329 namedfs.htrace.spanreceiver.classes/name 2330 value/value 2331 description 2332A comma separated list of the fully-qualified class name of classes 2333implementing SpanReceiver. The tracing system works by collecting 2334information in structs called 'Spans'. It is up to you to choose 2335how you want to receive this information by implementing the 2336SpanReceiver interface. 2337 /description 2338/property {code} I think this description should be something more like The HTrace SpanReceiver to use for the NameNode, DataNode, and JournalNode. We shouldn't try to explain what spans are... let's just link to the HTrace documentation rather than repeating it here. {code} 2339 2340property 2341 namedfs.client.htrace.spanreceiver.classes/name 2342 value/value 2343 description 2344A comma separated list of the fully-qualified class name of classes 2345implementing SpanReceiver. This property is used by DFSClient 2346for tracing started internally. 2347 /description 2348/property {code} I think this description should be something more like The HTrace SpanReceiver for the HDFS client. You do not need to enable this if your client has been modified to use HTrace. Again, just provide a reference to the HTrace docs. {code} 213 ### Starting tracing spans by configuration for HDFS client 214 215 You can start tracing spans by setting configuration for HDFS client. 216 This is useful for tracing programs where you don't have access to the source code. {code} How about, The DFSClient can enable tracing internally. This allows you to use HTrace with your client without modifying the client source code. Add usage of tracing originated in DFSClient to doc --- Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: HDFS-8284.001.patch Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7397) Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7397: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to 2.8. Thanks, Brahma. Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size --- Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3643) hdfsJniHelper.c unchecked string pointers
[ https://issues.apache.org/jira/browse/HDFS-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527200#comment-14527200 ] Colin Patrick McCabe commented on HDFS-3643: We put braces around all if statements. The else should be on the same line as the close bracket. {code} 229 if (returnType == '\0') 230 return newRuntimeError(env, 231invokeMethod: return type missing after ')'); {code} This if statement isn't needed since {{strchr}} will either return NULL, or a pointer to a the first occurrence of a right paren in the string. It can't return a pointer to a 0 byte. Looks good aside from that. hdfsJniHelper.c unchecked string pointers - Key: HDFS-3643 URL: https://issues.apache.org/jira/browse/HDFS-3643 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: HDFS-3643.02.patch, hdfs-3643-1.txt, hdfs3643-2.txt, hdfs3643.txt {code} str = methSignature; while (*str != ')') str++; str++; returnType = *str; {code} This loop needs to check for {{'\0'}}. Also the following {{if/else if/else if}} cascade doesn't handle unexpected values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527180#comment-14527180 ] Colin Patrick McCabe commented on HDFS-7758: bq. FsDatasetImpl#volumes is a FsVolumeList object, which does not leak FsVolumeImpl by itself. Moreover, TestWriteToReplica is using it. So it is still need to be package level field. I removed private FsDatasetImplgetVolumes() function, which exposed FsVolumeList#getVolumes(). Let's fix TestWriteToReplica so it doesn't do this, and make it private. Thanks. +1 once that's resolved. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch, HDFS-7758.007.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7397) Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7397: --- Summary: Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size (was: The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading) Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size --- Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7847: --- Assignee: Colin Patrick McCabe (was: Charles Lamb) Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7847 started by Colin Patrick McCabe. -- Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7847 stopped by Colin Patrick McCabe. -- Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7847 stopped by Colin Patrick McCabe. -- Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523779#comment-14523779 ] Colin Patrick McCabe commented on HDFS-8305: I'm re-kicking jenkins since we got a bunch of weird timeouts, just like with some of the patches yesterday. Seems unrelated to the patch HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523782#comment-14523782 ] Colin Patrick McCabe commented on HDFS-8305: Summary of the approach here: the format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS always use the latter form. This, in turn, ensures that inotify will always be able to consider the dst field as the full destination file name. This is a compatible change since we aren't removing the ability to handle the first form during edit log replay... we just no longer generate it. HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8213: --- Resolution: Fixed Fix Version/s: 2.7.1 Status: Resolved (was: Patch Available) committed to 2.7.1. thanks, guys. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Fix For: 2.7.1 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523571#comment-14523571 ] Colin Patrick McCabe commented on HDFS-8213: TestFileTruncate warning is unrelated. checkstyle continues to be busted. committing... DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523713#comment-14523713 ] Colin Patrick McCabe commented on HDFS-7758: Thanks, [~eddyxu]. Looks good overall. Given that we have a class named {{FsVolumeReference}}, we should consistently refer to Volume references rather than referred volumes So let's change {{ReferredFsVolumes}} - {{FsVolumeReferences}}. It would be nice to avoid all the typecasts. I think we can, if we change FsVolumeReference - FsVolumeReference? extends FsVolumeSpi and FsVolumeReferences - FsVolumeReferences? extends FsVolumeSpi. But let's do that in a follow-on change-- this change is big enough already. {{FsDatasetImpl#volumes}} is still package-private rather than truly private. Can you make it private? Otherwise other code in this package can reach in and use this field directly. I also think we should just get rid of {{FsDatasetImpl#getVolumes}}... objects don't need to use accessors for private internal fields. As long as that function exists there will be a temptation to make it more accessible, like has happened with many other accessors in the past. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523713#comment-14523713 ] Colin Patrick McCabe edited comment on HDFS-7758 at 5/1/15 7:05 PM: Thanks, [~eddyxu]. Looks good overall. Given that we have a class named {{FsVolumeReference}}, we should consistently refer to Volume references rather than referred volumes So let's change {{ReferredFsVolumes}} - {{FsVolumeReferences}}. It would be nice to avoid all the typecasts. I think we can, if we change {{FsVolumeReference \- FsVolumeReference? extends FsVolumeSpi}} and {{FsVolumeReferences \- FsVolumeReferences? extends FsVolumeSpi}}. But let's do that in a follow-on change-- this change is big enough already. {{FsDatasetImpl#volumes}} is still package-private rather than truly private. Can you make it private? Otherwise other code in this package can reach in and use this field directly. I also think we should just get rid of {{FsDatasetImpl#getVolumes}}... objects don't need to use accessors for private internal fields. As long as that function exists there will be a temptation to make it more accessible, like has happened with many other accessors in the past. was (Author: cmccabe): Thanks, [~eddyxu]. Looks good overall. Given that we have a class named {{FsVolumeReference}}, we should consistently refer to Volume references rather than referred volumes So let's change {{ReferredFsVolumes}} - {{FsVolumeReferences}}. It would be nice to avoid all the typecasts. I think we can, if we change FsVolumeReference - FsVolumeReference? extends FsVolumeSpi and FsVolumeReferences - FsVolumeReferences? extends FsVolumeSpi. But let's do that in a follow-on change-- this change is big enough already. {{FsDatasetImpl#volumes}} is still package-private rather than truly private. Can you make it private? Otherwise other code in this package can reach in and use this field directly. I also think we should just get rid of {{FsDatasetImpl#getVolumes}}... objects don't need to use accessors for private internal fields. As long as that function exists there will be a temptation to make it more accessible, like has happened with many other accessors in the past. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Attachment: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Status: Patch Available (was: Open) HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8305.001.patch HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Summary: HDFS INotify: the destination field of RenameOp should always end with the file name (was: HDFS INotify: the destination argument to RenameOp should always end with the file name) HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe HDFS INotify: the destination argument to RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8305) HDFS INotify: the destination argument to RenameOp should always end with the file name
Colin Patrick McCabe created HDFS-8305: -- Summary: HDFS INotify: the destination argument to RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe HDFS INotify: the destination argument to RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name
[ https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8305: --- Description: HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. (was: HDFS INotify: the destination argument to RenameOp should always end with the file name rather than sometimes being a directory name.) HDFS INotify: the destination field of RenameOp should always end with the file name Key: HDFS-8305 URL: https://issues.apache.org/jira/browse/HDFS-8305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe HDFS INotify: the destination field of RenameOp should always end with the file name rather than sometimes being a directory name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522663#comment-14522663 ] Colin Patrick McCabe edited comment on HDFS-8213 at 5/1/15 2:12 AM: findbugs warning is bogus. patch doesn't modify org.apache.hadoop.hdfs.DataStreamer$LastException. the rest of the stuff looks bogus as well (a lot of test timeouts on random things that aren't enabling / touching tracing), guess it's time to re-run again was (Author: cmccabe): findbugs warning is bogus. patch doesn't modify org.apache.hadoop.hdfs.DataStreamer$LastException. the rest of the stuff looks bogus as well, guess it's time to re-run again DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522663#comment-14522663 ] Colin Patrick McCabe commented on HDFS-8213: findbugs warning is bogus. patch doesn't modify org.apache.hadoop.hdfs.DataStreamer$LastException. the rest of the stuff looks bogus as well, guess it's time to re-run again DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements
[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522480#comment-14522480 ] Colin Patrick McCabe commented on HDFS-7836: Hi [~xinwei], The discussion on March 11th was focused on our proposal for off-heaping and parallelizing the block manager from February 24th. We spent a lot of time going through the proposal and responding to questions on the proposal. There was widespread agreement that we needed to reduce the garbage collection impact of the millions of BlockInfoContiguous structures. There was some disagreement about how to do that. Daryn argued that using large primitive arrays was the best way to go. Charles and I argued that using off-heap storage was better. The main advantage of large primitive arrays is that it makes the existing Java \-Xmx memory settings work as expected. The main advantage of off-heap is that it allows the use of things like {{Unsafe#compareAndSwap}}, which can often lead to more efficient concurrent data structures. Also, when using off-heap memory, we get to re-use malloc rather than essentially writing our own malloc for every subsystem. There was some hand-wringing about off-heap memory being slower, but I do not believe that this is valid. Apache Spark has found that their off-heap hash table was actually faster than the on-heap one, due to the ability to better control the memory layout. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html The key is to avoid using {{DirectByteBuffer}}, which is rather slow, and use {{Unsafe}} instead. However, Daryn has posted some patches using the large arrays approach. Since they are a nice incremental improvement, we are probably going to pick them up if there are no blockers. We are also looking at incremental improvements such as implementing backpressure for full block reports, and speeding up edit log replay (if possible). I would also like to look at parallelizing the full block report... if we can do that, we can get a dramatic improvement in FBR times by using more than 1 core. BlockManager Scalability Improvements - Key: HDFS-7836 URL: https://issues.apache.org/jira/browse/HDFS-7836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Charles Lamb Assignee: Charles Lamb Attachments: BlockManagerScalabilityImprovementsDesign.pdf Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520002#comment-14520002 ] Colin Patrick McCabe commented on HDFS-8113: +1 for HDFS-8113.02.patch. I think it's a good robustness improvement to the code. It would be nice to continue the investigation about why you hit this issue in another jira, as [~chengbing.liu] suggested. NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: HDFS-8113.02.patch, HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7847: --- Affects Version/s: (was: HDFS-7836) 2.8.0 Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518394#comment-14518394 ] Colin Patrick McCabe commented on HDFS-7758: can you rebase the patch on trunk? thanks Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8213: --- Attachment: HDFS-8213.002.patch DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518389#comment-14518389 ] Colin Patrick McCabe commented on HDFS-8213: Thanks for the review, [~iwasakims]. I attached a patch. Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518093#comment-14518093 ] Colin Patrick McCabe commented on HDFS-7397: +1 for v2 The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518196#comment-14518196 ] Colin Patrick McCabe commented on HDFS-8213: bq. In SpanReceiverHost#getInstance, loadSpanReceivers is called even if there is already initialized SRH instance. Is it intentional? Hmm. Good point... we don't want to be calling this more than once. Let's have a {{SpanReceiverHost}} for each config prefix. That's the easiest thing to do. Long-term, I think we should have a new API that avoids the need for all this boilerplate code in the client... bq. We need to fix TraceUtils#wrapHadoopConf which always assumes that prefix is hadoop.htrace.. fixed bq. Should we add entry for hdfs.client.htrace.spanreceiver.classes to hdfs-default.xml? yeah DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516053#comment-14516053 ] Colin Patrick McCabe commented on HDFS-7923: Thanks, [~clamb]. I like this approach. It avoids sending the block report until the NN requests it. So we don't have to throw away a whole block report to achieve backpressure. {code} public static final String DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = dfs.namenode.max.concurrent.block.reports; public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT = Integer.MAX_VALUE; {code} It seems like this should default to something less than the default number of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 10, it seems like this should be no more than 5 or 6, right? The main point here to avoid having the NN handler threads completely choked with block reports, and that is defeated if the value is MAX_INT. I realize that you probably intended this to be configured. But it seems like we should have a reasonable default that works for most people. {code} --- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto +++ hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto @@ -195,6 +195,7 @@ message HeartbeatRequestProto { optional uint64 cacheCapacity = 6 [ default = 0 ]; optional uint64 cacheUsed = 7 [default = 0 ]; optional VolumeFailureSummaryProto volumeFailureSummary = 8; + optional bool requestSendFullBlockReport = 9; } {code} Let's have a {{\[default = false\]}} here so that we don't have to add a bunch of clunky {{HasFoo}} checks. Unless there is something we'd like to do differently in the false and not present cases, but I can't think of what that would be. {code} + /* Number of block reports currently being processed. */ + private final AtomicInteger blockReportProcessingCount = new AtomicInteger(0); {code} I'm not sure an {{AtomicInteger}} makes sense here. We only modify this variable (write to it) when holding the FSN lock in write mode, right? And we only read from it when holding the FSN in read mode. So, there isn't any need to add atomic ops. {code} + boolean okToSendFullBlockReport = true; + if (requestSendFullBlockReport + blockManager.getBlockReportProcessingCount() = + maxConcurrentBlockReports) { +/* See if we should tell DN to back off for a bit. */ +final long lastBlockReportTime = blockManager.getDatanodeManager(). +getDatanode(nodeReg).getLastBlockReportTime(); +if (lastBlockReportTime 0) { + /* We've received at least one block report. */ + final long msSinceLastBlockReport = now() - lastBlockReportTime; + if (msSinceLastBlockReport maxBlockReportDeferralMsec) { +/* It hasn't been long enough to allow a BR to pass through. */ +okToSendFullBlockReport = false; + } +} + } + return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo, + okToSendFullBlockReport); {code} There is a TOCTOU (time of check, time of use) race condition here, right? 1000 datanodes come in and ask me whether it's ok to send an FBR. In each case, I check the number of ongoing FBRs, which is 0, and say yes. Then 1000 FBRs arrive all at once and the NN melts down. I think we need to track which datanodes we gave the green light to, and not decrement the counter until they either send that report, or some timeout expires. (We need the timeout in case datanodes go away after requesting permission-to-send.) The timeout can probably be as short as a few minutes. If you can't manage to send an FBR in a few minutes, there's more problems going on. {code} public static final String DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = dfs.blockreport.max.deferMsec; public static final longDFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = Long.MAX_VALUE; {code} Do we really need this config key? It seems like we added it because we wanted to avoid starvation (i.e. the case where a given DN never gets given the green light). But we are maintaining the last FBR time for each DN anyway. Surely we can just have a TreeMap or something and just tell the guys with the oldest {{lastSentTime}} to go. There aren't an infinite number of datanodes in the cluster, so eventually everyone will get the green light. I really would prefer not to have this tunable at all, since I think it's unnecessary. In any case, it's certainly doing us no good as MAX_U64. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project:
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514721#comment-14514721 ] Colin Patrick McCabe commented on HDFS-8213: bq. What that doesn't clarify to me is how I would connect the dots of spans initiated within the HDFSClient back to actions takes by said app. It depends on what we're trying to do. For example, we may be getting reports that the cluster is slow. In this case, seeing that HDFS / HBase requests complete quickly allows us to focus on other systems in the stack. Ultimately, the best thing will always be to have tracing in every app. But it will take a while to get there and having the ability to get useful results out of incremental steps is really useful. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514790#comment-14514790 ] Colin Patrick McCabe commented on HDFS-7397: I'm not sure that this is clearer. It's actually shorter and snips out the first sentence which describes what the cache is. If anything would make this clearer, it might be changing This parameter controls the size of that cache to This parameter controls the maximum number of file descriptors in the cache. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511727#comment-14511727 ] Colin Patrick McCabe commented on HDFS-8213: Thanks for that perspective, [~ndimiduk]. I actually don't see any conflict between allowing the client to trace itself, and allowing the application to trace itself. We should be able to support both use-cases. The people who don't want to have the client initiate tracing can simply not set {{hdfs.client.htrace.spanreceiver.classes}} and {{hdfs.client.trace.sampler}}. One very important use-case for HTrace is how can HBase figure out what HDFS is doing. For this use-case, of course, we don't need the client to initiate tracing... HBase can simply change its code to have the relevant calls to HTrace, and then that will get picked up by DFSClient, DataNode, NN, etc. I think this is the use-case you guys have been focusing on, and understandably so. But this is only one use-case of many. Another very important use case of tracing is I have proprietary app X that talks to HDFS, and it's slow. How come? For that use-case, we need to be able to have the DFSClient initiate the tracing, since we don't have the source code for the proprietary app (or if we do, modifying it and redeploying it may require a lengthy admin process.) bq. Should HBase and Accumulo clients be providing the same? I believe they should. It would be nice to be able to figure out why HBase is slow for some arbitrary workload, without hacking the client. I would like to be able to give a talk about profiling HBase that doesn't start with first, modify your source code in ways X, Y, and Z... it's much nicer to tell people to set a config option. Otherwise I feel like I'm telling people to write a mapreduce job in erlang... and you know what that really means I'm telling them :) This is especially true for non-devs. I think we could also improve our API to make it less likely (or maybe even impossible) for client and server tracing configs to conflict so much. I have some ideas for how to do that which I'll take a look at in a follow-on jira DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8213: --- Summary: DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace (was: DFSClient should not instantiate SpanReceiverHost) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8213: --- Assignee: Colin Patrick McCabe (was: Brahma Reddy Battula) Status: Patch Available (was: Open) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8213: --- Attachment: HDFS-8213.001.patch DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-8213.001.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509472#comment-14509472 ] Colin Patrick McCabe commented on HDFS-8213: bq. can you people suggest configuration for DFSClient..? I'm thinking {{hdfs.client.htrace.spanreceiver.classes}}. It's not completely trivial because I have to change our SpanReceiverHost thing, but shouldn't be too bad... let me see if I can post the patch DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8070: --- Resolution: Fixed Fix Version/s: 2.7.1 Status: Resolved (was: Patch Available) Committed to 2.7.1. Thanks for the reviews. Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.7.0 Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8070.001.patch HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507643#comment-14507643 ] Colin Patrick McCabe commented on HDFS-8213: Thanks again for kicking the tires on htrace, [~billie.rinaldi]. Let me see if I can get to the bottom of this. bq. As documented, each process must configure its own span receivers if it wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode will not do any tracing. You are right that you need to set {{hadoop.htrace.span.receiver.classes}} in the NameNode and DataNode configuration. However, you need to avoid setting it in the Accumulo configuration... instead, use whatever configuration Accumulo uses to set this value. This means that you can't use the same config file for the NN and DN as for the DFSClient, currently. bq. If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance only when the sampler is set to something other than NeverSampler. Keep in mind that {{hadoop.htrace.sampler}} is a completely different configuration key than {{hadoop.htrace.span.receiver.classes}}. If you are sampling at the level of Accumulo operations, I would not recommend setting {{hadoop.htrace.sampler}}, in any config file on the cluster. You want all of the sampling to happen inside accumulo. bq. I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well. HBase is exactly the same. In the case of HBase, you do not want to set {{hadoop.htrace.span.receiver.classes}} in the HBase config files. Instead, you would set {{hbase.htrace.span.receiver.classes}}. Then HBase would create a span receiver, and DFSClient would not. It seems like there is a hidden assumption here that you want to use the same config file for everything. But we really don't support that right now. Getting rid of the SpanReceiverHost in DFSClient is not an option since some people want to just trace HDFS without tracing any other system. Plus, it just kicks the problem up to a higher level. If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily make the same argument that Accumulo should not instantiate SpanReceiverHost since FooProcess is already doing that. And since FooProcess uses the accumulo client, it would conflict with whatever accumulo was configuring, if the same config file was used for everything. One thing we could do to make this a little less painful is to deduplicate span receivers inside the library. So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we could simply create one instance of that. This would allow us to use the same config file for everything. As a side note, [~billie.rinaldi], can you explain how you configure which sampler and span receiver accumulo uses? In HBase we set it to {{hbase.htrace.span.receiver.classes}}, etc. I would recommend something like {{accumulo.htrace.span.receiver.classes}} for consistency. This also allows you to sue the same config file for everything since it doesn't conflict with the keys which Hadoop uses to set these values. That is the reason why we set up the hbase.htrace namespace as separate from the hadoop.htrace namespace if you see what I'm saying. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507888#comment-14507888 ] Colin Patrick McCabe commented on HDFS-8213: bq. I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost, but instead depend on the process in which it resides to provide. This is how HBase client works as well. [~ndimiduk], what if I want to trace an HBase PUT all the way through the system? You're saying that the HBase client can't activate tracing on its own, so I have to make code changes to the process doing the PUT (i.e. the user of the HBase client) in order to get that info? Seems like a limitation. It's also worth pointing out that adding a {{SpanReceiverHost}} to the {{DFSClient}} is not really a new change... it goes back to HDFS-7055, last October. So it's been in there at least 6 months. Of course we can revisit it if that makes sense, but it's not really new except in the sense that it took a very long time to do another Hadoop release with that in it. (We really should start being better with releases...) Thinking about this a little more, another possible resolution here is to change the configuration keys which the DFSClient looks for so that it's different than the ones which the NameNode and DataNode look for. Right now {{hadoop.htrace.spanreceiver.classes}} will activate span receivers in both the NN and the DFSClient. But the DFSClient could instead look for {{hdfs.client.htrace.spanreceiver.classes}}. Then [~billie.rinaldi] could use the same configuration file for everything, and the dfsclient would never create its own span receivers or samplers. And I could continue to trace the dfsclient without modifying daemon code. Seems like a good resolution. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508049#comment-14508049 ] Colin Patrick McCabe commented on HDFS-8213: bq. Yes, clients need tracing, and when they do they should enable it themselves. FsShell should enable tracing when it wants to use it, instead of doing that in DFSClient. There are hundreds or maybe even thousands of programs that use the HDFS client. It's not practical to modify them all to run {{Trace#addSpanReceiver}}. In some cases the programs that use HDFS are even proprietary or customer programs where we don't have access to the source code. I have some ideas for how to make this all work better with a better interface in {{Tracer}}. We might need an incompatible interface change to do it, though. For now, let's just change the config key for DFSClient... that should fix the problem for Accumulo. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507819#comment-14507819 ] Colin Patrick McCabe commented on HDFS-8213: bq. The hadoop.htrace.span.receiver.classes is not set in Accumulo configuration files, but it is set in Hadoop configuration files. Accumulo uses Hadoop configuration files to connect to HDFS, thus its uses of DFSClient will have Hadoop's hadoop.htrace.span.receiver.classes. HBase does something similar, I believe. The way Cloudera Manager manages configuration files is that it creates separate config files for each daemon. So the NameNode reads its own set of config files, the DataNode reads a separate set, Hive reads another set, Flume reads still another set, etc. etc. So {{hadoop.htrace.span.receiver.classes}} would be set in the NN and DN configuration files, but not in the ones targetted towards the DFSClients. Does Ambari do something similar? It seems like using the same set of configuration files for everything would be very limiting, if you wanted to do something like turn on short circuit for some clients but not for others, etc. I know from a developer perspective it's frustrating to not be able to use the same config files for every daemon (I like to do that myself) but it's not broken, just inconvenient. bq. No. The way it works (did work, until this change was introduced in DFSClient) is that server processes instantiate SpanReceiverHost. If an app wants tracing, it also has to instantiate SpanReceiverHost. The Accumulo client does not instantiate SPH itself, as DFSClient should not. It's not true that only server processes need tracing. Clients also need tracing. For example, one test I do a lot is to run FsShell with tracing turned on. This would not be possible if only servers had tracing. The point that I was making with my example is that the Accumulo client itself probably should have tracing too, and this would potentially conflict with another server using the Accumulo client. bq. The change in DFSClient changes how apps are supposed to use tracing. It seems like this would be mitigated by deduping SpanReceivers in htrace, but if we go that route I would like the DFSClient change to be reverted until HDFS moves to a version of htrace with deduping. Otherwise, Accumulo and HBase will have to leave HDFS tracing disabled, or change how they're configuring HDFS, if they wish to avoid double delivery of spans. We're doing a new release of HTrace soon... like this week or the next. If we can get the deduping into the 3.2 release, we can bump the version in Hadoop 2.7.1. We can't change what's in Hadoop 2.7.0, that release is done. Thanks again for trying this stuff out. I'm going to work on a deduping patch for HTrace, would appreciate a review. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Brahma Reddy Battula Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8133) Improve readability of deleted block check
[ https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505479#comment-14505479 ] Colin Patrick McCabe commented on HDFS-8133: +1. Thanks, Daryn. Test failures are unrelated. I ran the tests locally and they passed. Improve readability of deleted block check -- Key: HDFS-8133 URL: https://issues.apache.org/jira/browse/HDFS-8133 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-8133.patch The current means of checking if a block is deleted is checking if its block collection is null. A more readable approach is an isDeleted method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8133) Improve readability of deleted block check
[ https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8133: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Improve readability of deleted block check -- Key: HDFS-8133 URL: https://issues.apache.org/jira/browse/HDFS-8133 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 2.8.0 Attachments: HDFS-8133.patch The current means of checking if a block is deleted is checking if its block collection is null. A more readable approach is an isDeleted method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505797#comment-14505797 ] Colin Patrick McCabe commented on HDFS-8213: Hi Billie, {{DFSClient}} needs to instantiate {{SpanReceiverHost}} in order to implement tracing, in the case where the process using the {{DFSClient}} doesn't configure its own span receivers. If you are concerned about multiple span receivers being instantiated, simply set {{hadoop.htrace.span.receiver.classes}} to the empty string, and Hadoop won't instantiate any span receivers. That should be its default anyway. DFSClient should not instantiate SpanReceiverHost - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Priority: Critical DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500965#comment-14500965 ] Colin Patrick McCabe commented on HDFS-8070: bq. For this patch, do I have to redeploy the HDFS DN to test? Yes, this is a datanode-side fix. Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.7.0 Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-8070.001.patch HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498553#comment-14498553 ] Colin Patrick McCabe commented on HDFS-8113: There are already a bunch of places in the code where we check whether BlockCollection is null before doing something with it. Example: {code} if (block instanceof BlockInfoContiguous) { BlockCollection bc = ((BlockInfoContiguous) block).getBlockCollection(); String fileName = (bc == null) ? [orphaned] : bc.getName(); out.print(fileName + : ); } {code} also: {code} private int getReplication(Block block) { final BlockCollection bc = blocksMap.getBlockCollection(block); return bc == null? 0: bc.getBlockReplication(); } {code} I think that the majority of cases already have a check. My suggestion is just that we extend this checking against null to all uses of the BlockInfoContiguous structure's block collection. If the problem is too difficult to reproduce with a {{MiniDFSCluster}}, perhaps we can just do a unit test of the copy constructor itself. As I said earlier, I don't understand the rationale for keeping blocks with no associated INode out of the BlocksMap. It complicates the block report since it requires us to check whether each block has an associated inode or not before adding it to the BlocksMap. But if that change seems too ambitious for this JIRA, we can deal with that later. NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498556#comment-14498556 ] Colin Patrick McCabe commented on HDFS-7993: bq, Maybe we can change the description from repl to live repl? It will address the confusion others might have. Can we do that in a separate JIRA? Since it's an incompatible change we might want to do it only in Hadoop 3.0. There are a lot of people parsing fsck output (unfortunately). The rest looks good, if we can keep the existing output the same I would love to add the replicaDetails option. Incorrect descriptions in fsck when nodes are decommissioned Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
[ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498556#comment-14498556 ] Colin Patrick McCabe edited comment on HDFS-7993 at 4/16/15 7:28 PM: - bq. Maybe we can change the description from repl to live repl? It will address the confusion others might have. Can we do that in a separate JIRA? Since it's an incompatible change we might want to do it only in Hadoop 3.0. There are a lot of people parsing fsck output (unfortunately). The rest looks good, if we can keep the existing output the same I would love to add the replicaDetails option. was (Author: cmccabe): bq, Maybe we can change the description from repl to live repl? It will address the confusion others might have. Can we do that in a separate JIRA? Since it's an incompatible change we might want to do it only in Hadoop 3.0. There are a lot of people parsing fsck output (unfortunately). The rest looks good, if we can keep the existing output the same I would love to add the replicaDetails option. Incorrect descriptions in fsck when nodes are decommissioned Key: HDFS-7993 URL: https://issues.apache.org/jira/browse/HDFS-7993 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch When you run fsck with -files or -racks, you will get something like below if one of the replicas is decommissioned. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} That is because in NamenodeFsck, the repl count comes from live replicas count; while the actual nodes come from LocatedBlock which include decommissioned nodes. Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned nodes in the verification; just like how fsck excludes decommissioned nodes when it check for under replicated blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8070: --- Summary: Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode (was: ShortCircuitShmManager goes into dead mode, stopping all operations) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Kihwal Lee HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8070: --- Attachment: HDFS-8070.001.patch Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.7.0 Reporter: Gopal V Assignee: Colin Patrick McCabe Attachments: HDFS-8070.001.patch HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at
[jira] [Assigned] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-8070: -- Assignee: Colin Patrick McCabe (was: Kihwal Lee) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Colin Patrick McCabe HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at
[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8070: --- Status: Patch Available (was: Open) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.7.0 Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-8070.001.patch HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at
[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8070: --- Priority: Blocker (was: Major) Affects Version/s: (was: 2.8.0) 2.7.0 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.7.0 Reporter: Gopal V Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-8070.001.patch HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496485#comment-14496485 ] Colin Patrick McCabe commented on HDFS-8113: Thanks for the explanation, guys. I wasn't aware of the invariant that {{BlockInfoContiguous}} structures with {{bc == null}} were not in the {{BlocksMap}}. I think we should remove this invariant, and instead simply have the {{BlocksMap}} contain all the blocks. The memory savings from keeping them out is trivial, since the number of blocks without associated inodes should be very small. I think we can just check whether the INode field is null when appropriate. That seems to be the direction that the patch here is taking, and I think it makes sense. NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8070) ShortCircuitShmManager goes into dead mode, stopping all operations
[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497287#comment-14497287 ] Colin Patrick McCabe commented on HDFS-8070: Both Hadoop 2.7 and the Haodop 2 branch (which is what I assume you mean by Hadoop 2.8) have HDFS-7915. So I think there should not be any compatibility issues on that front. Can you check whether the patch up at HADOOP-11802 solves your issue? At very least, it should get you a more informative exception. ShortCircuitShmManager goes into dead mode, stopping all operations --- Key: HDFS-8070 URL: https://issues.apache.org/jira/browse/HDFS-8070 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Kihwal Lee HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded split-generation. I hit this immediately after I upgraded the data, so I wonder if the ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 Client? {code} 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=2, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got IOException calling shutdown(SHUT_RDWR) java.nio.channels.ClosedChannelException at org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) at org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) expr = (not leaf-0) 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to release short-circuit shared memory slot Slot(slotIdx=4, shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId a86ee34576d93c4964005d90b0d97c38 at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) at
[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads
[ https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494761#comment-14494761 ] Colin Patrick McCabe commented on HDFS-8088: also I am at a conference now, so I apologize if my replies are slow! Reduce the number of HTrace spans generated by HDFS reads - Key: HDFS-8088 URL: https://issues.apache.org/jira/browse/HDFS-8088 Project: Hadoop HDFS Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8088.001.patch HDFS generates too many trace spans on read right now. Every call to read() we make generates its own span, which is not very practical for things like HBase or Accumulo that do many such reads as part of a single operation. Instead of tracing every call to read(), we should only trace the cases where we refill the buffer inside a BlockReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads
[ https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494760#comment-14494760 ] Colin Patrick McCabe commented on HDFS-8088: bq. I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and things are much happier. The performance is much closer to what I previously saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc) Thanks, Josh. I discovered that we are reading non-trivial amounts of remote data inside the {{DFSInputStream#blockSeekTo}} method, so I think we'll also need to create a trace span for that one. Also, the {{BlockReader}} trace scopes will need to use the {{DFSClient#traceSampler}} (currently they don't) or else we will never get any trace spans from reads. I think that is what we would need to get the patch on this JIRA committed. bq. Giving a very quick look at the code (and making what's possible a bad guess), perhaps all of the 0ms length spans (denoted by zeroCount in the above, as opposed to the nonzeroCount) are when DFSOutputStream#writeChunk is only appending data into the current packet and not actually submitting that packet for the data streamer to process? With some more investigation into the hierarchy, I bet I could definitively determine that. Keep in mind that doing a write in HDFS just hands the data off to a background thread called {{DataStreamer}}. which writes it out asynchronously. The only reason why {{writeChunk}} would ever have a time much higher than 0 is that there was lock contention (the {{DataStreamer#waitAndQueuePacket}} method couldn't get the {{DataStreamer#dataQueue}} lock immediately) or that there were more than {{dfs.client.write.max-packets-in-flight}} unacked messages in flight already. (HDFS calls messages by the name of packets even though each message is typically multiple ethernet packets.) I guess we have to step back and ask what the end goal is for HTrace. If the end goal is figuring out why some requests had a high latency, it makes sense to only trace parts of the program that we think will take a non-trivial amount of time. In that case, we should probably only trace the handoff of the full packet to the {{DataStreamer}}. If the end goal is understanding the downstream consequences of all operations, then we have to connect up the dots for all operations. That's why I originally had all calls to write() and read() create trace spans. I'm inclined to lean more towards goal #1 (figure out why specific requests had high latency) than goal #2. I think that looking at the high-latency outliers will naturally lead us to fix the biggest performance issues (such as locking contention, disk issues, network issues, etc.). Also, if all calls to write() and read() create trace spans, then this will have a multiplicative effect on our top-level sampling rate which I think is undesirable. bq. That being said, I hope I'm not being too much of a bother with all this. I was just really excited to see this functionality in HDFS and want to make we're getting good data coming back out. Thanks for bearing with me and for the patches you've already made! We definitely appreciate all the input. I think it's very helpful. I do think maybe we should target 2.7.1 for some of these changes since I need to think through everything. I know that's frustrating, but hopefully if we maintain a reasonable Hadoop release cadence it won't be too bad. I'd also like to run some patches by you guys to see if it improves the usefulness of HTrace to you. And I am doing a bunch of testing internally which I think will turn up a lot more potential improvements to HTrace and to its integration into HDFS. Use-cases really should be very helpful in motivating us here. Reduce the number of HTrace spans generated by HDFS reads - Key: HDFS-8088 URL: https://issues.apache.org/jira/browse/HDFS-8088 Project: Hadoop HDFS Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8088.001.patch HDFS generates too many trace spans on read right now. Every call to read() we make generates its own span, which is not very practical for things like HBase or Accumulo that do many such reads as part of a single operation. Instead of tracing every call to read(), we should only trace the cases where we refill the buffer inside a BlockReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
[ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492577#comment-14492577 ] Colin Patrick McCabe commented on HDFS-8113: It seems like BlockCollection will be null if the block doesn't belong to any file. We should also have a unit test for this. I was thinking: 1. start a mini dfs cluster with 2 datanodes 2. create a file with repl=2 and close it 3. take down one DN 4. delete the file 5. wait 6. bring back up the other DN, which will still have the block from the file which was deleted NullPointerException in BlockInfoContiguous causes block report failure --- Key: HDFS-8113 URL: https://issues.apache.org/jira/browse/HDFS-8113 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: HDFS-8113.patch The following copy constructor can throw NullPointerException if {{bc}} is null. {code} protected BlockInfoContiguous(BlockInfoContiguous from) { this(from, from.bc.getBlockReplication()); this.bc = from.bc; } {code} We have observed that some DataNodes keeps failing doing block reports with NameNode. The stacktrace is as follows. Though we are not using the latest version, the problem still exists. {quote} 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking
[ https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492587#comment-14492587 ] Colin Patrick McCabe commented on HDFS-6919: +1 for the idea Enforce a single limit for RAM disk usage and replicas cached via locking - Key: HDFS-6919 URL: https://issues.apache.org/jira/browse/HDFS-6919 Project: Hadoop HDFS Issue Type: Bug Reporter: Arpit Agarwal Assignee: Arpit Agarwal The DataNode can have a single limit for memory usage which applies to both replicas cached via CCM and replicas on RAM disk. See comments [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025], [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245] and [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575] for discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492596#comment-14492596 ] Colin Patrick McCabe commented on HDFS-7878: Symlinks and directories both have a unique file ID just the same as files. Maybe inodeID is a better name than fileID? Typically, you never actually get a FileInfo for a symlink itself unless you call getFileLinkInfo. If you simply call getFileInfo, you get the FileInfo for the file the symlink points to, not for the symlink itself. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8063) Fix intermittent test failures in TestTracing
[ https://issues.apache.org/jira/browse/HDFS-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8063: --- Summary: Fix intermittent test failures in TestTracing (was: Fix test failure in TestTracing) Fix intermittent test failures in TestTracing - Key: HDFS-8063 URL: https://issues.apache.org/jira/browse/HDFS-8063 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Attachments: HDFS-8063.001.patch, HDFS-8063.002.patch, testReadTraceHooks.html Tests in TestTracing sometimes fails, especially on slow machine. The cause is that spans is possible to arrive at receiver after {{assertSpanNamesFound}} passed and {{SetSpanReceiver.SetHolder.spans.clear()}} is called for next test case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7188) support build libhdfs3 on windows
[ https://issues.apache.org/jira/browse/HDFS-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-7188. Resolution: Fixed Fix Version/s: HDFS-6994 committed to HDFS-6994 support build libhdfs3 on windows - Key: HDFS-7188 URL: https://issues.apache.org/jira/browse/HDFS-7188 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Environment: Windows System, Visual Studio 2010 Reporter: Zhanwei Wang Assignee: Thanh Do Fix For: HDFS-6994 Attachments: HDFS-7188-branch-HDFS-6994-0.patch, HDFS-7188-branch-HDFS-6994-1.patch, HDFS-7188-branch-HDFS-6994-2.patch, HDFS-7188-branch-HDFS-6994-3.patch libhdfs3 should work on windows -- This message was sent by Atlassian JIRA (v6.3.4#6332)