from:"Colin Patrick McCabe \(JIRA\)"

[jira] [Commented] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-13 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542817#comment-14542817
 ] 

Colin Patrick McCabe commented on HDFS-8380:


committed, thanks

 Always call addStoredBlock on blocks which have been shifted from one storage 
 to another
 

 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-8380.001.patch


 We should always call addStoredBlock on blocks which have been shifted from 
 one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-13 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8380:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

 Always call addStoredBlock on blocks which have been shifted from one storage 
 to another
 

 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-8380.001.patch


 We should always call addStoredBlock on blocks which have been shifted from 
 one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-05-12 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540492#comment-14540492
 ] 

Colin Patrick McCabe commented on HDFS-7240:


This looks like a really interesting way to achieve a scalable blob store using 
some of the infrastructure we already have in HDFS.  It could be a good 
direction for the project to go in.

We should have a meeting to review the design and talk about how it fits in 
with the rest of what's going on in HDFS-land.  Perhaps we could have a webex 
on the week of May 25th or June 1?  (I am going to be out of town next week so 
I can't do next week.)

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7900) ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor

2015-05-12 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-7900.

Resolution: Duplicate

 ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor  
 -

 Key: HDFS-7900
 URL: https://issues.apache.org/jira/browse/HDFS-7900
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
 Environment: hadoop cdh5 2.3.0 hbase 0.98
Reporter: zhangshilong
Priority: Critical

 I delete some hbase's files  manually or use rm -rf blk_   to delete 
 the  blockfile directly,  but hbase keeps the file descriptor  very long time.
 I found these file descriptor may be kept in shortcircuitcache replicaMap，
 but could not find when the file descriptor will be removed.  replicaMap has 
 no limits size for putting.
 run： lsof -p pid  |grep deleted 
 part of result：
 lk_1102309377_28571078.meta (deleted)
 java8430 hbase 8537r   REG  8,145 536870912  806553760 
 /search/hadoop08/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir61/blk_1102541663
  (deleted)
 java8430 hbase 8540r   REG  8,113   4194311  812434001 
 /search/hadoop06/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir62/subdir21/blk_1102524193_28785917.meta
  (deleted)
 java8430 hbase 8541r   REG   8,65 536870912  813718517 
 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618
  (deleted)
 java8430 hbase 8542r   REG   8,65   4194311  813718518 
 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618_28785342.meta
  (deleted)
 java8430 hbase 8543r   REG  8,193 536870912 1886733815 
 /search/hadoop12/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir20/subdir22/blk_1102533549
  (deleted)
 java8430 hbase 8544r   REG   8,65   4194311  814828988 
 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir49/blk_1102676585_28938309.meta
  (deleted)
 java8430 hbase 8545r   REG   8,17   4194311  812962137 
 /search/hadoop10/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir53/blk_1102597493_28859217.meta
  (deleted)
 java8430 hbase 8546r   REG   8,97   4194311  810468992 
 /search/hadoop05/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir4/subdir46/blk_1102524567_28786291.meta
  (deleted)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7900) ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor

2015-05-12 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540540#comment-14540540
 ] 

Colin Patrick McCabe commented on HDFS-7900:


Since Hadoop 2.3 was released, there have been some improvements.  The 
short-circuit code now uses its shared memory segment IPC to tell the client 
when a file descriptor has been invalidated.  One reason for invalidation is 
because the block file has been deleted

I am going to close this as a duplicate.  Please reopen if you have any issues 
with the HDFS-6750 code.

 ShortCircuitCache.replicaInfoMap keeps too many deleted file descriptor  
 -

 Key: HDFS-7900
 URL: https://issues.apache.org/jira/browse/HDFS-7900
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.3.0
 Environment: hadoop cdh5 2.3.0 hbase 0.98
Reporter: zhangshilong
Priority: Critical

 I delete some hbase's files  manually or use rm -rf blk_   to delete 
 the  blockfile directly,  but hbase keeps the file descriptor  very long time.
 I found these file descriptor may be kept in shortcircuitcache replicaMap，
 but could not find when the file descriptor will be removed.  replicaMap has 
 no limits size for putting.
 run： lsof -p pid  |grep deleted 
 part of result：
 lk_1102309377_28571078.meta (deleted)
 java8430 hbase 8537r   REG  8,145 536870912  806553760 
 /search/hadoop08/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir61/blk_1102541663
  (deleted)
 java8430 hbase 8540r   REG  8,113   4194311  812434001 
 /search/hadoop06/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir62/subdir21/blk_1102524193_28785917.meta
  (deleted)
 java8430 hbase 8541r   REG   8,65 536870912  813718517 
 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618
  (deleted)
 java8430 hbase 8542r   REG   8,65   4194311  813718518 
 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618_28785342.meta
  (deleted)
 java8430 hbase 8543r   REG  8,193 536870912 1886733815 
 /search/hadoop12/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir20/subdir22/blk_1102533549
  (deleted)
 java8430 hbase 8544r   REG   8,65   4194311  814828988 
 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir49/blk_1102676585_28938309.meta
  (deleted)
 java8430 hbase 8545r   REG   8,17   4194311  812962137 
 /search/hadoop10/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir53/blk_1102597493_28859217.meta
  (deleted)
 java8430 hbase 8546r   REG   8,97   4194311  810468992 
 /search/hadoop05/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir4/subdir46/blk_1102524567_28786291.meta
  (deleted)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8358) TestTraceAdmin fails

2015-05-12 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540743#comment-14540743
]

Colin Patrick McCabe commented on HDFS-8358:

bq. I would like to make TraceAdmin able to handle other config prefix such as
yarn.htrace.. I think SpanReceiverHost#addSpanReceiver is the right place to
deal with the configs without prefix. Anyway we should file a new JIRA for that.

Yeah. The user should be able to set
-Clocal-file-span-receiver.path=/tmp/foo on the YARN daemons and have it
apply the relevant YARN htrace properties. There isn't any need to require the
user to send '-Cyarn.htrace.local-file-span-receciver.path=...' We know that
YARN tracing is what we want to configure, by virtue of the fact that the
\-host argument was for a YARN host.

bq. The failure of TestHdfsConfigFields will be fixed in HDFS-8371. It should
be addressed as separate JIRA and I will update the patch once HDFS-8371 is
committed.

OK.

+1 for v3, can you file a follow-on JIRA to talk about the prefix issue?

TestTraceAdmin fails

Key: HDFS-8358
URL: https://issues.apache.org/jira/browse/HDFS-8358
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Masatake Iwasaki
Attachments: HADOOP-11940.001.patch, HDFS-8358.002.patch,
HDFS-8358.003.patch

After HADOOP-11912, {{TestTraceAdmin#testCreateAndDestroySpanReceiver}} in
hdfs started failing.
It was probably unnoticed because the jira changed and triggered unit testing
in common only.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-12 Thread Colin Patrick McCabe (JIRA)

Colin Patrick McCabe created HDFS-8380:
--

 Summary: Always call addStoredBlock on blocks which have been 
shifted from one storage to another
 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


We should always call addStoredBlock on blocks which have been shifted from one 
storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-12 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8380:
---
Status: Patch Available  (was: Open)

 Always call addStoredBlock on blocks which have been shifted from one storage 
 to another
 

 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8380.001.patch


 We should always call addStoredBlock on blocks which have been shifted from 
 one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-12 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8380:
---
Attachment: HDFS-8380.001.patch

 Always call addStoredBlock on blocks which have been shifted from one storage 
 to another
 

 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8380.001.patch


 We should always call addStoredBlock on blocks which have been shifted from 
 one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-12 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541061#comment-14541061
 ] 

Colin Patrick McCabe commented on HDFS-8380:


Background: HDFS-6830 attempted to implement block shifting logic, whereby 
when the NameNode received a report about some replica saying it was in some 
DataNode storage, it would update the NN's internal data structures to reflect 
the fact that this replica was not in any other storages on that DataNode.  The 
assumption was (and still is) that each replica is present in at most one 
storage on each DN (an assumption we might want to revisit at some point, but 
that's outside the scope of this JIRA...).

HDFS-6830 was flawed, however.  Although it changed {{BlockManager#addBlock}} 
to update the storage which a particular block was in, it would not actually 
call {{BlockManager#addBlock}} on blocks it received in the full block report, 
if it had already seen their IDs.  So in the case where blocks were moved 
between storages, HDFS-6830 would not actually update the internal data 
structures on the NameNode... they would remain in the old storages.

HDFS-6991, although it would appear to be unrelated based on the title, 
actually has a partial fix for the bug in HDFS-6830, in the form of this code:

{code}
- (!storedBlock.findDatanode(dn)
-|| corruptReplicas.isReplicaCorrupt(storedBlock, dn))) {
+ (storedBlock.findStorageInfo(storageInfo) == -1 ||
+corruptReplicas.isReplicaCorrupt(storedBlock, dn))) {
  addBlock(...)
{code}

However, HDFS-6991 doesn't fix the issue for RBW blocks.  Admittedly, it is 
much less likely for RBW blocks to be shifted between storages, because when 
restarting a datanode, the RBW replicas become RWR.  However, for the sake of 
robustness, we should implement the shifting behavior there too.

This patch does that.  It also adds logging for the first time we receive a 
storage report for a given storage.  This should happen only once per storage, 
so it won't generate too many logs.  It will be useful for tracing what is 
going on.  It also adds debug logs to the initial storage report, similar to 
the debug logs available for the non-initial storage report.  Finally, it adds 
a unit test for the shifting behavior.  The unit test tests shifting of 
finalized blocks rather than RBW ones, so it doesn't require the rest of the 
patch to pass, but it's still very useful for preventing regressions.

 Always call addStoredBlock on blocks which have been shifted from one storage 
 to another
 

 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8380.001.patch


 We should always call addStoredBlock on blocks which have been shifted from 
 one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8311) DataStreamer.transfer() should timeout the socket InputStream.

2015-05-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535272#comment-14535272
 ] 

Colin Patrick McCabe commented on HDFS-8311:


Good catch, [~yzhangal].  We should fix those other cases as well.  I think we 
should do those in separate JIRAs, if that's more convenient for you.  Also, it 
would be nice to have unit tests for these timeouts at some point, to ensure 
that they don't get removed.

+1 again for the patch.  Thanks, guys.

 DataStreamer.transfer() should timeout the socket InputStream.
 --

 Key: HDFS-8311
 URL: https://issues.apache.org/jira/browse/HDFS-8311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: BB2015-05-TBR
 Attachments: 
 0001-HDFS-8311-DataStreamer.transfer-should-timeout-the-s.patch, 
 HDFS-8311.001.patch


 While validating some HA failure modes we found that HDFS clients can take a 
 long time to recover or sometimes don't recover at all since we don't setup 
 the socket timeout in the InputStream:
 {code}
 private void transfer () { ...
 ...
  OutputStream unbufOut = NetUtils.getOutputStream(sock, writeTimeout);
  InputStream unbufIn = NetUtils.getInputStream(sock);
 ...
 }
 {code}
 The InputStream should have its own timeout in the same way as the 
 OutputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8284) Update documentation about how to use HTrace with HDFS

2015-05-08 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8284:
---
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

 Update documentation about how to use HTrace with HDFS
 --

 Key: HDFS-8284
 URL: https://issues.apache.org/jira/browse/HDFS-8284
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, 
 HDFS-8284.003.patch


 Tracing originated in DFSClient uses configuration keys prefixed with 
 dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys 
 prefixed with dfs.htrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc

2015-05-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535334#comment-14535334
 ] 

Colin Patrick McCabe commented on HDFS-8284:


+1.  Thanks, [~iwasakims].

 Add usage of tracing originated in DFSClient to doc
 ---

 Key: HDFS-8284
 URL: https://issues.apache.org/jira/browse/HDFS-8284
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
  Labels: BB2015-05-TBR
 Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, 
 HDFS-8284.003.patch


 Tracing originated in DFSClient uses configuration keys prefixed with 
 dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys 
 prefixed with dfs.htrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8284) Update documentation about how to use HTrace with HDFS

2015-05-08 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8284:
---
Summary: Update documentation about how to use HTrace with HDFS  (was: Add 
usage of tracing originated in DFSClient to doc)

 Update documentation about how to use HTrace with HDFS
 --

 Key: HDFS-8284
 URL: https://issues.apache.org/jira/browse/HDFS-8284
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
  Labels: BB2015-05-TBR
 Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch, 
 HDFS-8284.003.patch


 Tracing originated in DFSClient uses configuration keys prefixed with 
 dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys 
 prefixed with dfs.htrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-05-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535487#comment-14535487
 ] 

Colin Patrick McCabe commented on HDFS-8113:


bq. Colin Patrick McCabe Would you mind committing this?

Sure.  Will commit now.  It is a good robustness improvement.

If we find more information about why the {{BlockInfoContinguous}} was added to 
the {{BlocksMap}} without a {{BlockCollection}}, we can file a separate JIRA 
for that.  Thanks, guys.

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0, 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
  Labels: BB2015-05-TBR
 Attachments: HDFS-8113.02.patch, HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8113) Add check for null BlockCollection pointers in BlockInfoContiguous structures

2015-05-08 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8113:
---
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

 Add check for null BlockCollection pointers in BlockInfoContiguous structures
 -

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0, 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: HDFS-8113.02.patch, HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8113) Add check for null BlockCollection pointers in BlockInfoContiguous structures

2015-05-08 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8113:
---
Summary: Add check for null BlockCollection pointers in BlockInfoContiguous 
structures  (was: NullPointerException in BlockInfoContiguous causes block 
report failure)

 Add check for null BlockCollection pointers in BlockInfoContiguous structures
 -

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0, 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
  Labels: BB2015-05-TBR
 Attachments: HDFS-8113.02.patch, HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8246) Get HDFS file name based on block pool id and block id

2015-05-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535502#comment-14535502
 ] 

Colin Patrick McCabe commented on HDFS-8246:


From the C client, if you wanted to know what files block ID 123 was in, you 
could do {{hdfsListDirectory(fs, path=/.reserved/.blockIdToFiles/123, 
...)}}.  I think one of the advantages of having a path in .reserved instead 
of a new API is everything just works for the C client, C++ client, webhdfs, 
etc. etc.

 Get HDFS file name based on block pool id and block id
 --

 Key: HDFS-8246
 URL: https://issues.apache.org/jira/browse/HDFS-8246
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: HDFS, hdfs-client, namenode
Reporter: feng xu
Assignee: feng xu
  Labels: BB2015-05-TBR
 Attachments: HDFS-8246.0.patch


 This feature provides HDFS shell command and C/Java API to retrieve HDFS file 
 name based on block pool id and block id.
 1. The Java API in class DistributedFileSystem
 public String getFileName(String poolId, long blockId) throws IOException
 2. The C API in hdfs.c
 char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId)
 3. The HDFS shell command 
  hdfs dfs [generic options] -fn poolId blockId
 This feature is useful if you have HDFS block file name in local file system 
 and want to  find out the related HDFS file name in HDFS name space 
 (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop).
   Each HDFS block file name in local file system contains both block pool id 
 and block id, for sample HDFS block file name 
 /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825,
   the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id 
 is 1073741825. The block  pool id is uniquely related to a HDFS name 
 node/name space,  and the block id is uniquely related to a HDFS file within 
 a HDFS name node/name space, so the combination of block pool id and a block 
 id is uniquely related a HDFS file name. 
 The shell command and C/Java API do not map the block pool id to name node, 
 so it’s user’s responsibility to talk to the correct name node in federation 
 environment that has multiple name nodes. The block pool id is used by name 
 node to check if the user is talking with the correct name node.
 The implementation is straightforward. The client request to get HDFS file 
 name reaches the new method String getFileName(String poolId, long blockId) 
 in FSNamesystem in name node through RPC,  and the new method does the 
 followings,
 (1)   Validate the block pool id.
 (2)   Create Block  based on the block id.
 (3)   Get BlockInfoContiguous from Block.
 (4)   Get BlockCollection from BlockInfoContiguous.
 (5)   Get file name from BlockCollection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8358) TestTraceAdmin fails

2015-05-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535625#comment-14535625
 ] 

Colin Patrick McCabe commented on HDFS-8358:


Thanks for finding this and filing the JIRA, [~kihwal].

Intuitively it seems like I should be able to set  
\-Clocal-file-span-receiver.path=/tmp/foo, not 
\-Cdfs.htrace.local-file-span-receiver.path=/tmp/foo.  We always want to be 
modifying the {{dfs.htrace}} config keys with the {{\-C}} options we pass, 
right?  So maybe let's just prefix anything we get via {{\-C}} with 
{{dfs.htrace}} to avoid the extra typing.

 TestTraceAdmin fails
 

 Key: HDFS-8358
 URL: https://issues.apache.org/jira/browse/HDFS-8358
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Masatake Iwasaki
 Attachments: HADOOP-11940.001.patch


 After HADOOP-11912, {{TestTraceAdmin#testCreateAndDestroySpanReceiver}} in 
 hdfs started failing.
 It was probably unnoticed because the jira changed and triggered unit testing 
 in common only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-05 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7847:
---
   Resolution: Fixed
Fix Version/s: (was: HDFS-7836)
   2.8.0
   Status: Resolved  (was: Patch Available)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: 2.8.0

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 HDFS-7847.005.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8311) DataStreamer.transfer() should timeout the socket InputStream.

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529027#comment-14529027
 ] 

Colin Patrick McCabe commented on HDFS-8311:


Thanks, [~esteban].  +1 pending jenkins

 DataStreamer.transfer() should timeout the socket InputStream.
 --

 Key: HDFS-8311
 URL: https://issues.apache.org/jira/browse/HDFS-8311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Attachments: 
 0001-HDFS-8311-DataStreamer.transfer-should-timeout-the-s.patch, 
 HDFS-8311.001.patch


 While validating some HA failure modes we found that HDFS clients can take a 
 long time to recover or sometimes don't recover at all since we don't setup 
 the socket timeout in the InputStream:
 {code}
 private void transfer () { ...
 ...
  OutputStream unbufOut = NetUtils.getOutputStream(sock, writeTimeout);
  InputStream unbufIn = NetUtils.getInputStream(sock);
 ...
 }
 {code}
 The InputStream should have its own timeout in the same way as the 
 OutputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-05-05 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
  Resolution: Fixed
   Fix Version/s: 2.7.1
Target Version/s: 2.7.1  (was: 2.8.0)
  Status: Resolved  (was: Patch Available)

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.1

 Attachments: HDFS-8305.001.patch, HDFS-8305.002.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.  Previously, in some 
 cases when using the old rename, this was not the case.  The format of 
 OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as 
 RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS 
 always use the latter form. This, in turn, ensures that inotify will always 
 be able to consider the dst field as the full destination file name. This is 
 a compatible change since we aren't removing the ability to handle the first 
 form during edit log replay... we just no longer generate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8311) DataStreamer.transfer() should timeout the socket InputStream.

2015-05-05 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8311:
---
Target Version/s: 2.8.0
  Status: Patch Available  (was: Open)

 DataStreamer.transfer() should timeout the socket InputStream.
 --

 Key: HDFS-8311
 URL: https://issues.apache.org/jira/browse/HDFS-8311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Attachments: 
 0001-HDFS-8311-DataStreamer.transfer-should-timeout-the-s.patch, 
 HDFS-8311.001.patch


 While validating some HA failure modes we found that HDFS clients can take a 
 long time to recover or sometimes don't recover at all since we don't setup 
 the socket timeout in the InputStream:
 {code}
 private void transfer () { ...
 ...
  OutputStream unbufOut = NetUtils.getOutputStream(sock, writeTimeout);
  InputStream unbufIn = NetUtils.getInputStream(sock);
 ...
 }
 {code}
 The InputStream should have its own timeout in the same way as the 
 OutputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529094#comment-14529094
 ] 

Colin Patrick McCabe commented on HDFS-8284:


Thanks, [~iwasakims].  This looks really good.

The only thing I would suggest is that we should remove the section or two on 
zipkin stuff.  Instead, we should link to the upstream HTrace documentation 
about setting up span receivers.  Having it in there makes people think that 
the only way to use htrace is through zipkin, which is certainly not true.

 Add usage of tracing originated in DFSClient to doc
 ---

 Key: HDFS-8284
 URL: https://issues.apache.org/jira/browse/HDFS-8284
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-8284.001.patch, HDFS-8284.002.patch


 Tracing originated in DFSClient uses configuration keys prefixed with 
 dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys 
 prefixed with dfs.htrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8246) Get HDFS file name based on block pool id and block id

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529256#comment-14529256
 ] 

Colin Patrick McCabe commented on HDFS-8246:


How about having {{/.reserved/.blockIdToFiles/$ID}} map to a directory 
containing the hdfs files which have the given block ID?  I think this would be 
a lot better than having a whole other set of APIs.  Remember also that 
multiple snapshotted files can contain the same block ID.

 Get HDFS file name based on block pool id and block id
 --

 Key: HDFS-8246
 URL: https://issues.apache.org/jira/browse/HDFS-8246
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: HDFS, hdfs-client, namenode
Reporter: feng xu
Assignee: feng xu
 Attachments: HDFS-8246.0.patch


 This feature provides HDFS shell command and C/Java API to retrieve HDFS file 
 name based on block pool id and block id.
 1. The Java API in class DistributedFileSystem
 public String getFileName(String poolId, long blockId) throws IOException
 2. The C API in hdfs.c
 char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId)
 3. The HDFS shell command 
  hdfs dfs [generic options] -fn poolId blockId
 This feature is useful if you have HDFS block file name in local file system 
 and want to  find out the related HDFS file name in HDFS name space 
 (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop).
   Each HDFS block file name in local file system contains both block pool id 
 and block id, for sample HDFS block file name 
 /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825,
   the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id 
 is 1073741825. The block  pool id is uniquely related to a HDFS name 
 node/name space,  and the block id is uniquely related to a HDFS file within 
 a HDFS name node/name space, so the combination of block pool id and a block 
 id is uniquely related a HDFS file name. 
 The shell command and C/Java API do not map the block pool id to name node, 
 so it’s user’s responsibility to talk to the correct name node in federation 
 environment that has multiple name nodes. The block pool id is used by name 
 node to check if the user is talking with the correct name node.
 The implementation is straightforward. The client request to get HDFS file 
 name reaches the new method String getFileName(String poolId, long blockId) 
 in FSNamesystem in name node through RPC,  and the new method does the 
 followings,
 (1)   Validate the block pool id.
 (2)   Create Block  based on the block id.
 (3)   Get BlockInfoContiguous from Block.
 (4)   Get BlockCollection from BlockInfoContiguous.
 (5)   Get file name from BlockCollection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529197#comment-14529197
 ] 

Colin Patrick McCabe commented on HDFS-8271:


An uber jira is a good idea.  Also, let's not change default behavior by 
binding on ipv6 by default.  It will create problems for sure.

 NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine 
 and IPv6 enabled
 

 Key: HDFS-8271
 URL: https://issues.apache.org/jira/browse/HDFS-8271
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Nate Edel
Assignee: Nate Edel
  Labels: ipv6

 NameNode works properly on IPv4 or IPv6 single stack (assuming in the latter 
 case that scripts have been changed to disable preferIPv4Stack, and dependent 
 on the client/data node fix in HDFS-8078).  On dual-stack machines, NameNode 
 listens only on IPv4 (even ignoring preferIPv6Addresses being set.)
 Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support 
 binding to both the IPv4 and IPv6 machine addresses so that we can support 
 heterogenous clusters (some dual-stack and some IPv6-only machines.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528998#comment-14528998
 ] 

Colin Patrick McCabe commented on HDFS-7847:


+1.  Thanks, [~clamb].

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 HDFS-7847.005.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528910#comment-14528910
 ] 

Colin Patrick McCabe commented on HDFS-7758:


+1.  Thanks, Eddy.

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch, HDFS-7758.007.patch, 
 HDFS-7758.008.patch, HDFS-7758.010.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-05-05 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7758:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.8.0

 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch, HDFS-7758.007.patch, 
 HDFS-7758.008.patch, HDFS-7758.010.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8157) Writes to RAM DISK reserve locked memory for block files

2015-05-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529275#comment-14529275
 ] 

Colin Patrick McCabe commented on HDFS-8157:


Thanks for this, [~arpitagarwal].

I don't think we should add {{DataNode#skipNativeIoCheckForTesting}}.  To 
simulate locking memory without adding a dependency on NativeIO, then just 
create a custom cache manipulator.  This custom manipulator can always return 
true for {{verifyCanMlock}}.  There are some other unit tests doing this.

{code}
public void releaseReservedSpace(long bytesToRelease, boolean 
releaseLockedMemory);
{code}
I would rather have a separate function for releasing the memory than overload 
the meaning of this one.

Maybe I am missing something, but I don't understand the purpose behind 
{{releaseRoundDown}}.  Why would we round down to a page size when allocating 
or releasing memory?

 Writes to RAM DISK reserve locked memory for block files
 

 Key: HDFS-8157
 URL: https://issues.apache.org/jira/browse/HDFS-8157
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-8157.01.patch


 Per discussion on HDFS-6919, the first step is that writes to RAM disk will 
 reserve locked memory via the FsDatasetCache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
Description: HDFS INotify: the destination field of RenameOp should always 
end with the file name rather than sometimes being a directory name.  
Previously, in some cases when using the old rename, this was not the case.  
The format of OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented 
as RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS 
always use the latter form. This, in turn, ensures that inotify will always be 
able to consider the dst field as the full destination file name. This is a 
compatible change since we aren't removing the ability to handle the first form 
during edit log replay... we just no longer generate it.  (was: HDFS INotify: 
the destination field of RenameOp should always end with the file name rather 
than sometimes being a directory name.)

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.  Previously, in some 
 cases when using the old rename, this was not the case.  The format of 
 OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as 
 RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS 
 always use the latter form. This, in turn, ensures that inotify will always 
 be able to consider the dst field as the full destination file name. This is 
 a compatible change since we aren't removing the ability to handle the first 
 form during edit log replay... we just no longer generate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-05-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527023#comment-14527023
 ] 

Colin Patrick McCabe commented on HDFS-8305:


bq. can we add a description to this jira explaining why (e.g., This, in turn, 
ensures that inotify will always be able to consider the dst field as the full 
destination file name.)?

added

bq. can we add java doc to the void logRename( methods to say something like 
if the rename source is a file, the target is better to be a file too, this 
will ensure that inotify will always be able to confider the dst file as the 
full destination file name.?

ok

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.  Previously, in some 
 cases when using the old rename, this was not the case.  The format of 
 OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as 
 RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS 
 always use the latter form. This, in turn, ensures that inotify will always 
 be able to consider the dst field as the full destination file name. This is 
 a compatible change since we aren't removing the ability to handle the first 
 form during edit log replay... we just no longer generate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
Attachment: HDFS-8305.002.patch

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch, HDFS-8305.002.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.  Previously, in some 
 cases when using the old rename, this was not the case.  The format of 
 OP_EDIT_LOG_RENAME_OLD allows moving /f to /d/f to be represented as 
 RENAME(src=/f, dst=/d) or RENAME(src=/f, dst=/d/f). This change makes HDFS 
 always use the latter form. This, in turn, ensures that inotify will always 
 be able to consider the dst field as the full destination file name. This is 
 a compatible change since we aren't removing the ability to handle the first 
 form during edit log replay... we just no longer generate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7847:
---
Assignee: Charles Lamb  (was: Colin Patrick McCabe)
  Status: Patch Available  (was: Open)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7847:
---
Issue Type: Bug  (was: Sub-task)
Parent: (was: HDFS-7836)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527525#comment-14527525
 ] 

Colin Patrick McCabe commented on HDFS-7847:


[~clamb], can you rebase this on trunk?  Looks like it's gotten stale

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8284) Add usage of tracing originated in DFSClient to doc

2015-05-04 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527214#comment-14527214
]

Colin Patrick McCabe commented on HDFS-8284:

Thanks, [~iwasakims].

{code}
2328property
2329 namedfs.htrace.spanreceiver.classes/name
2330 value/value
2331 description
2332A comma separated list of the fully-qualified class name of classes
2333implementing SpanReceiver. The tracing system works by collecting
2334information in structs called 'Spans'. It is up to you to choose
2335how you want to receive this information by implementing the
2336SpanReceiver interface.
2337 /description
2338/property
{code}

I think this description should be something more like The HTrace SpanReceiver
to use for the NameNode, DataNode, and JournalNode. We shouldn't try to
explain what spans are... let's just link to the HTrace documentation rather
than repeating it here.

{code}
2339
2340property
2341 namedfs.client.htrace.spanreceiver.classes/name
2342 value/value
2343 description
2344A comma separated list of the fully-qualified class name of classes
2345implementing SpanReceiver. This property is used by DFSClient
2346for tracing started internally.
2347 /description
2348/property
{code}
I think this description should be something more like The HTrace SpanReceiver
for the HDFS client. You do not need to enable this if your client has been
modified to use HTrace. Again, just provide a reference to the HTrace docs.

{code}
213 ### Starting tracing spans by configuration for HDFS client
214
215 You can start tracing spans by setting configuration for HDFS client.
216 This is useful for tracing programs where you don't have access to the
source code.
{code}
How about, The DFSClient can enable tracing internally. This allows you to
use HTrace with your client without modifying the client source code.

Add usage of tracing originated in DFSClient to doc
---

Key: HDFS-8284
URL: https://issues.apache.org/jira/browse/HDFS-8284
Project: Hadoop HDFS
Issue Type: Improvement
Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Attachments: HDFS-8284.001.patch

Tracing originated in DFSClient uses configuration keys prefixed with
dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys
prefixed with dfs.htrace.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7397) Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7397:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to 2.8.  Thanks, Brahma.

 Add more detail to the documentation for the conf key 
 dfs.client.read.shortcircuit.streams.cache.size
 ---

 Key: HDFS-7397
 URL: https://issues.apache.org/jira/browse/HDFS-7397
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-7397-002.patch, HDFS-7397.patch


 For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB?  
 Interestingly, it is neither in MB nor KB.  It is the number of shortcircuit 
 streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3643) hdfsJniHelper.c unchecked string pointers

2015-05-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527200#comment-14527200
 ] 

Colin Patrick McCabe commented on HDFS-3643:


We put braces around all if statements.  The else should be on the same 
line as the close bracket.

{code}
229 if (returnType == '\0')
230 return newRuntimeError(env,
231invokeMethod: return type missing after 
')');
{code}
This if statement isn't needed since {{strchr}} will either return NULL, or a 
pointer to a the first occurrence of a right paren in the string.  It can't 
return a pointer to a 0 byte.

Looks good aside from that.

 hdfsJniHelper.c unchecked string pointers
 -

 Key: HDFS-3643
 URL: https://issues.apache.org/jira/browse/HDFS-3643
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: HDFS-3643.02.patch, hdfs-3643-1.txt, hdfs3643-2.txt, 
 hdfs3643.txt


 {code}
 str = methSignature;
 while (*str != ')') str++;
 str++;
 returnType = *str;
 {code}
 This loop needs to check for {{'\0'}}. Also the following {{if/else if/else 
 if}} cascade doesn't handle unexpected values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-05-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527180#comment-14527180
 ] 

Colin Patrick McCabe commented on HDFS-7758:


bq. FsDatasetImpl#volumes is a FsVolumeList object, which does not leak 
FsVolumeImpl by itself. Moreover, TestWriteToReplica is using it. So it is 
still need to be package level field. I removed private 
FsDatasetImplgetVolumes() function, which exposed FsVolumeList#getVolumes().

Let's fix TestWriteToReplica so it doesn't do this, and make it private.  
Thanks.

+1 once that's resolved.

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch, HDFS-7758.007.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7397) Add more detail to the documentation for the conf key dfs.client.read.shortcircuit.streams.cache.size

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7397:
---
Summary: Add more detail to the documentation for the conf key 
dfs.client.read.shortcircuit.streams.cache.size  (was: The conf key 
dfs.client.read.shortcircuit.streams.cache.size is misleading)

 Add more detail to the documentation for the conf key 
 dfs.client.read.shortcircuit.streams.cache.size
 ---

 Key: HDFS-7397
 URL: https://issues.apache.org/jira/browse/HDFS-7397
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HDFS-7397-002.patch, HDFS-7397.patch


 For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB?  
 Interestingly, it is neither in MB nor KB.  It is the number of shortcircuit 
 streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7847:
---
Assignee: Colin Patrick McCabe  (was: Charles Lamb)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7847 started by Colin Patrick McCabe.
--
 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work stopped] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7847 stopped by Colin Patrick McCabe.
--
 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work stopped] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-04 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7847 stopped by Colin Patrick McCabe.
--
 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-05-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523779#comment-14523779
 ] 

Colin Patrick McCabe commented on HDFS-8305:


I'm re-kicking jenkins since we got a bunch of weird timeouts, just like with 
some of the patches yesterday.  Seems unrelated to the patch

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-05-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523782#comment-14523782
 ] 

Colin Patrick McCabe commented on HDFS-8305:


Summary of the approach here: the format of OP_EDIT_LOG_RENAME_OLD allows 
moving /f to /d/f to be represented as RENAME(src=/f, dst=/d) or RENAME(src=/f, 
dst=/d/f).  This change makes HDFS always use the latter form.  This, in turn, 
ensures that inotify will always be able to consider the dst field as the full 
destination file name.  This is a compatible change since we aren't removing 
the ability to handle the first form during edit log replay... we just no 
longer generate it.

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-05-01 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8213:
---
   Resolution: Fixed
Fix Version/s: 2.7.1
   Status: Resolved  (was: Patch Available)

committed to 2.7.1.  thanks, guys.

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-05-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523571#comment-14523571
 ] 

Colin Patrick McCabe commented on HDFS-8213:


TestFileTruncate warning is unrelated.  checkstyle continues to be busted.  
committing...

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-05-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523713#comment-14523713
 ] 

Colin Patrick McCabe commented on HDFS-7758:


Thanks, [~eddyxu].  Looks good overall.

Given that we have a class named {{FsVolumeReference}}, we should consistently 
refer to Volume references rather than referred volumes So let's change 
{{ReferredFsVolumes}} - {{FsVolumeReferences}}.

It would be nice to avoid all the typecasts.  I think we can, if we change 
FsVolumeReference - FsVolumeReference? extends FsVolumeSpi and 
FsVolumeReferences - FsVolumeReferences? extends FsVolumeSpi.  But let's do 
that in a follow-on change-- this change is big enough already.

{{FsDatasetImpl#volumes}} is still package-private rather than truly private.  
Can you make it private?  Otherwise other code in this package can reach in and 
use this field directly.  I also think we should just get rid of 
{{FsDatasetImpl#getVolumes}}... objects don't need to use accessors for private 
internal fields.  As long as that function exists there will be a temptation to 
make it more accessible, like has happened with many other accessors in the 
past.

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-05-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523713#comment-14523713
 ] 

Colin Patrick McCabe edited comment on HDFS-7758 at 5/1/15 7:05 PM:


Thanks, [~eddyxu].  Looks good overall.

Given that we have a class named {{FsVolumeReference}}, we should consistently 
refer to Volume references rather than referred volumes So let's change 
{{ReferredFsVolumes}} - {{FsVolumeReferences}}.

It would be nice to avoid all the typecasts.  I think we can, if we change 
{{FsVolumeReference \- FsVolumeReference? extends FsVolumeSpi}} and 
{{FsVolumeReferences \- FsVolumeReferences? extends FsVolumeSpi}}.  But 
let's do that in a follow-on change-- this change is big enough already.

{{FsDatasetImpl#volumes}} is still package-private rather than truly private.  
Can you make it private?  Otherwise other code in this package can reach in and 
use this field directly.  I also think we should just get rid of 
{{FsDatasetImpl#getVolumes}}... objects don't need to use accessors for private 
internal fields.  As long as that function exists there will be a temptation to 
make it more accessible, like has happened with many other accessors in the 
past.


was (Author: cmccabe):
Thanks, [~eddyxu].  Looks good overall.

Given that we have a class named {{FsVolumeReference}}, we should consistently 
refer to Volume references rather than referred volumes So let's change 
{{ReferredFsVolumes}} - {{FsVolumeReferences}}.

It would be nice to avoid all the typecasts.  I think we can, if we change 
FsVolumeReference - FsVolumeReference? extends FsVolumeSpi and 
FsVolumeReferences - FsVolumeReferences? extends FsVolumeSpi.  But let's do 
that in a follow-on change-- this change is big enough already.

{{FsDatasetImpl#volumes}} is still package-private rather than truly private.  
Can you make it private?  Otherwise other code in this package can reach in and 
use this field directly.  I also think we should just get rid of 
{{FsDatasetImpl#getVolumes}}... objects don't need to use accessors for private 
internal fields.  As long as that function exists there will be a temptation to 
make it more accessible, like has happened with many other accessors in the 
past.

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-04-30 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
Attachment: HDFS-8305.001.patch

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-04-30 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
Status: Patch Available  (was: Open)

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8305.001.patch


 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-04-30 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
Summary: HDFS INotify: the destination field of RenameOp should always end 
with the file name  (was: HDFS INotify: the destination argument to RenameOp 
should always end with the file name)

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 HDFS INotify: the destination argument to RenameOp should always end with the 
 file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8305) HDFS INotify: the destination argument to RenameOp should always end with the file name

2015-04-30 Thread Colin Patrick McCabe (JIRA)

Colin Patrick McCabe created HDFS-8305:
--

 Summary: HDFS INotify: the destination argument to RenameOp should 
always end with the file name
 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


HDFS INotify: the destination argument to RenameOp should always end with the 
file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8305) HDFS INotify: the destination field of RenameOp should always end with the file name

2015-04-30 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8305:
---
Description: HDFS INotify: the destination field of RenameOp should always 
end with the file name rather than sometimes being a directory name.  (was: 
HDFS INotify: the destination argument to RenameOp should always end with the 
file name rather than sometimes being a directory name.)

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name
 

 Key: HDFS-8305
 URL: https://issues.apache.org/jira/browse/HDFS-8305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 HDFS INotify: the destination field of RenameOp should always end with the 
 file name rather than sometimes being a directory name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-30 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522663#comment-14522663
 ] 

Colin Patrick McCabe edited comment on HDFS-8213 at 5/1/15 2:12 AM:


findbugs warning is bogus.  patch doesn't modify 
org.apache.hadoop.hdfs.DataStreamer$LastException.  the rest of the stuff looks 
bogus as well (a lot of test timeouts on random things that aren't enabling / 
touching tracing), guess it's time to re-run again


was (Author: cmccabe):
findbugs warning is bogus.  patch doesn't modify 
org.apache.hadoop.hdfs.DataStreamer$LastException.  the rest of the stuff looks 
bogus as well, guess it's time to re-run again

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-30 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522663#comment-14522663
 ] 

Colin Patrick McCabe commented on HDFS-8213:


findbugs warning is bogus.  patch doesn't modify 
org.apache.hadoop.hdfs.DataStreamer$LastException.  the rest of the stuff looks 
bogus as well, guess it's time to re-run again

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-04-30 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522480#comment-14522480
]

Colin Patrick McCabe commented on HDFS-7836:

Hi [~xinwei],

The discussion on March 11th was focused on our proposal for off-heaping and
parallelizing the block manager from February 24th. We spent a lot of time
going through the proposal and responding to questions on the proposal.

There was widespread agreement that we needed to reduce the garbage collection
impact of the millions of BlockInfoContiguous structures. There was some
disagreement about how to do that. Daryn argued that using large primitive
arrays was the best way to go. Charles and I argued that using off-heap
storage was better.

The main advantage of large primitive arrays is that it makes the existing Java
\-Xmx memory settings work as expected. The main advantage of off-heap is that
it allows the use of things like {{Unsafe#compareAndSwap}}, which can often
lead to more efficient concurrent data structures. Also, when using off-heap
memory, we get to re-use malloc rather than essentially writing our own malloc
for every subsystem.

There was some hand-wringing about off-heap memory being slower, but I do not
believe that this is valid. Apache Spark has found that their off-heap hash
table was actually faster than the on-heap one, due to the ability to better
control the memory layout.
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html
The key is to avoid using {{DirectByteBuffer}}, which is rather slow, and use
{{Unsafe}} instead.

However, Daryn has posted some patches using the large arrays approach.
Since they are a nice incremental improvement, we are probably going to pick
them up if there are no blockers. We are also looking at incremental
improvements such as implementing backpressure for full block reports, and
speeding up edit log replay (if possible). I would also like to look at
parallelizing the full block report... if we can do that, we can get a dramatic
improvement in FBR times by using more than 1 core.

BlockManager Scalability Improvements
-

Key: HDFS-7836
URL: https://issues.apache.org/jira/browse/HDFS-7836
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
Attachments: BlockManagerScalabilityImprovementsDesign.pdf

Improvements to BlockManager scalability.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-29 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520002#comment-14520002
 ] 

Colin Patrick McCabe commented on HDFS-8113:


+1 for HDFS-8113.02.patch.  I think it's a good robustness improvement to the 
code.

It would be nice to continue the investigation about why you hit this issue in 
another jira, as [~chengbing.liu] suggested.

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: HDFS-8113.02.patch, HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7847:
---
Affects Version/s: (was: HDFS-7836)
   2.8.0

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-04-28 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518394#comment-14518394
 ] 

Colin Patrick McCabe commented on HDFS-7758:


can you rebase the patch on trunk?  thanks

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-28 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8213:
---
Attachment: HDFS-8213.002.patch

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-28 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518389#comment-14518389
 ] 

Colin Patrick McCabe commented on HDFS-8213:


Thanks for the review, [~iwasakims].  I attached a patch.  Let's do the 
hdfs-default.xml and other docs stuff later since it's not directly related to 
this

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading

2015-04-28 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518093#comment-14518093
 ] 

Colin Patrick McCabe commented on HDFS-7397:


+1 for v2

 The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
 

 Key: HDFS-7397
 URL: https://issues.apache.org/jira/browse/HDFS-7397
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HDFS-7397-002.patch, HDFS-7397.patch


 For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB?  
 Interestingly, it is neither in MB nor KB.  It is the number of shortcircuit 
 streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-28 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518196#comment-14518196
]

Colin Patrick McCabe commented on HDFS-8213:

bq. In SpanReceiverHost#getInstance, loadSpanReceivers is called even if there
is already initialized SRH instance. Is it intentional?

Hmm. Good point... we don't want to be calling this more than once. Let's
have a {{SpanReceiverHost}} for each config prefix. That's the easiest thing
to do. Long-term, I think we should have a new API that avoids the need for
all this boilerplate code in the client...

bq. We need to fix TraceUtils#wrapHadoopConf which always assumes that prefix
is hadoop.htrace..

fixed

bq. Should we add entry for hdfs.client.htrace.spanreceiver.classes to
hdfs-default.xml?

yeah

DFSClient should use hdfs.client.htrace HTrace configuration prefix rather
than hadoop.htrace
-

Key: HDFS-8213
URL: https://issues.apache.org/jira/browse/HDFS-8213
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
Attachments: HDFS-8213.001.patch

DFSClient initializing SpanReceivers is a problem for Accumulo, which manages
SpanReceivers through its own configuration. This results in the same
receivers being registered multiple times and spans being delivered more than
once. The documentation says SpanReceiverHost.getInstance should be issued
once per process, so there is no expectation that DFSClient should do this.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-27 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516053#comment-14516053
 ] 

Colin Patrick McCabe commented on HDFS-7923:


Thanks, [~clamb].  I like this approach.  It avoids sending the block report 
until the NN requests it.  So we don't have to throw away a whole block report 
to achieve backpressure.

{code}
  public static final String  DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = 
dfs.namenode.max.concurrent.block.reports;
  public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT 
= Integer.MAX_VALUE;
{code}
It seems like this should default to something less than the default number of 
RPC handler threads, not to MAX_INT.  Given that dfs.namenode.handler.count = 
10, it seems like this should be no more than 5 or 6, right?  The main point 
here to avoid having the NN handler threads completely choked with block 
reports, and that is defeated if the value is MAX_INT.  I realize that you 
probably intended this to be configured.  But it seems like we should have a 
reasonable default that works for most people.

{code}
--- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
+++ hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
@@ -195,6 +195,7 @@ message HeartbeatRequestProto {
   optional uint64 cacheCapacity = 6 [ default = 0 ];
   optional uint64 cacheUsed = 7 [default = 0 ];
   optional VolumeFailureSummaryProto volumeFailureSummary = 8;
+  optional bool requestSendFullBlockReport = 9;
 }
{code}
Let's have a {{\[default = false\]}} here so that we don't have to add a bunch 
of clunky {{HasFoo}} checks.  Unless there is something we'd like to do 
differently in the false and not present cases, but I can't think of what 
that would be.

{code}
+  /* Number of block reports currently being processed. */
+  private final AtomicInteger blockReportProcessingCount = new 
AtomicInteger(0);
{code}
I'm not sure an {{AtomicInteger}} makes sense here.  We only modify this 
variable (write to it) when holding the FSN lock in write mode, right?  And we 
only read from it when holding the FSN in read mode.  So, there isn't any need 
to add atomic ops.

{code}
+  boolean okToSendFullBlockReport = true;
+  if (requestSendFullBlockReport 
+  blockManager.getBlockReportProcessingCount() =
+  maxConcurrentBlockReports) {
+/* See if we should tell DN to back off for a bit. */
+final long lastBlockReportTime = blockManager.getDatanodeManager().
+getDatanode(nodeReg).getLastBlockReportTime();
+if (lastBlockReportTime  0) {
+  /* We've received at least one block report. */
+  final long msSinceLastBlockReport = now() - lastBlockReportTime;
+  if (msSinceLastBlockReport  maxBlockReportDeferralMsec) {
+/* It hasn't been long enough to allow a BR to pass through. */
+okToSendFullBlockReport = false;
+  }
+}
+  }
+  return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo,
+  okToSendFullBlockReport);
{code}
There is a TOCTOU (time of check, time of use) race condition here, right?  
1000 datanodes come in and ask me whether it's ok to send an FBR.  In each 
case, I check the number of ongoing FBRs, which is 0, and say yes.  Then 1000 
FBRs arrive all at once and the NN melts down.

I think we need to track which datanodes we gave the green light to, and not 
decrement the counter until they either send that report, or some timeout 
expires.  (We need the timeout in case datanodes go away after requesting 
permission-to-send.)  The timeout can probably be as short as a few minutes.  
If you can't manage to send an FBR in a few minutes, there's more problems 
going on.

{code}
  public static final String  DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = 
dfs.blockreport.max.deferMsec;
  public static final longDFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = 
Long.MAX_VALUE;
{code}
Do we really need this config key?  It seems like we added it because we wanted 
to avoid starvation (i.e. the case where a given DN never gets given the green 
light).  But we are maintaining the last FBR time for each DN anyway.  Surely 
we can just have a TreeMap or something and just tell the guys with the oldest 
{{lastSentTime}} to go.  There aren't an infinite number of datanodes in the 
cluster, so eventually everyone will get the green light.

I really would prefer not to have this tunable at all, since I think it's 
unnecessary.  In any case, it's certainly doing us no good as MAX_U64.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project:

[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-27 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514721#comment-14514721
]

Colin Patrick McCabe commented on HDFS-8213:

bq. What that doesn't clarify to me is how I would connect the dots of spans
initiated within the HDFSClient back to actions takes by said app.

It depends on what we're trying to do. For example, we may be getting reports
that the cluster is slow. In this case, seeing that HDFS / HBase requests
complete quickly allows us to focus on other systems in the stack.

Ultimately, the best thing will always be to have tracing in every app. But it
will take a while to get there and having the ability to get useful results out
of incremental steps is really useful.

DFSClient should use hdfs.client.htrace HTrace configuration prefix rather
than hadoop.htrace
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading

2015-04-27 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514790#comment-14514790
 ] 

Colin Patrick McCabe commented on HDFS-7397:


I'm not sure that this is clearer.  It's actually shorter and snips out the 
first sentence which describes what the cache is.

If anything would make this clearer, it might be changing This parameter 
controls the size of that cache to This parameter controls the maximum number 
of file descriptors in the cache.

 The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
 

 Key: HDFS-7397
 URL: https://issues.apache.org/jira/browse/HDFS-7397
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HDFS-7397.patch


 For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB?  
 Interestingly, it is neither in MB nor KB.  It is the number of shortcircuit 
 streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-24 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511727#comment-14511727
]

Colin Patrick McCabe commented on HDFS-8213:

Thanks for that perspective, [~ndimiduk].

I actually don't see any conflict between allowing the client to trace itself,
and allowing the application to trace itself. We should be able to support
both use-cases. The people who don't want to have the client initiate tracing
can simply not set {{hdfs.client.htrace.spanreceiver.classes}} and
{{hdfs.client.trace.sampler}}.

One very important use-case for HTrace is how can HBase figure out what HDFS
is doing. For this use-case, of course, we don't need the client to initiate
tracing... HBase can simply change its code to have the relevant calls to
HTrace, and then that will get picked up by DFSClient, DataNode, NN, etc. I
think this is the use-case you guys have been focusing on, and understandably
so. But this is only one use-case of many. Another very important use case of
tracing is I have proprietary app X that talks to HDFS, and it's slow. How
come? For that use-case, we need to be able to have the DFSClient initiate
the tracing, since we don't have the source code for the proprietary app (or if
we do, modifying it and redeploying it may require a lengthy admin process.)

bq. Should HBase and Accumulo clients be providing the same?

I believe they should. It would be nice to be able to figure out why HBase is
slow for some arbitrary workload, without hacking the client. I would like to
be able to give a talk about profiling HBase that doesn't start with first,
modify your source code in ways X, Y, and Z... it's much nicer to tell people
to set a config option. Otherwise I feel like I'm telling people to write a
mapreduce job in erlang... and you know what that really means I'm telling them
:) This is especially true for non-devs.

I think we could also improve our API to make it less likely (or maybe even
impossible) for client and server tracing configs to conflict so much. I have
some ideas for how to do that which I'll take a look at in a follow-on jira

DFSClient should use hdfs.client.htrace HTrace configuration prefix rather
than hadoop.htrace
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-23 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8213:
---
Summary: DFSClient should use hdfs.client.htrace HTrace configuration 
prefix rather than hadoop.htrace  (was: DFSClient should not instantiate 
SpanReceiverHost)

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Brahma Reddy Battula
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-23 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8213:
---
Assignee: Colin Patrick McCabe  (was: Brahma Reddy Battula)
  Status: Patch Available  (was: Open)

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-8213.001.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace

2015-04-23 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8213:
---
Attachment: HDFS-8213.001.patch

 DFSClient should use hdfs.client.htrace HTrace configuration prefix rather 
 than hadoop.htrace
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-8213.001.patch


 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-23 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509472#comment-14509472
 ] 

Colin Patrick McCabe commented on HDFS-8213:


bq. can you people suggest configuration for DFSClient..?

I'm thinking {{hdfs.client.htrace.spanreceiver.classes}}.  It's not completely 
trivial because I have to change our SpanReceiverHost thing, but shouldn't be 
too bad... let me see if I can post the patch

 DFSClient should not instantiate SpanReceiverHost
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Brahma Reddy Battula
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-23 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8070:
---
   Resolution: Fixed
Fix Version/s: 2.7.1
   Status: Resolved  (was: Patch Available)

Committed to 2.7.1.  Thanks for the reviews.

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.7.0
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8070.001.patch


 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at

[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-22 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507643#comment-14507643
]

Colin Patrick McCabe commented on HDFS-8213:

Thanks again for kicking the tires on htrace, [~billie.rinaldi]. Let me see if
I can get to the bottom of this.

bq. As documented, each process must configure its own span receivers if it
wants to use tracing. If I set hadoop.htrace.span.receiver.classes to the empty
string, then the NameNode and DataNode will not do any tracing.

You are right that you need to set {{hadoop.htrace.span.receiver.classes}} in
the NameNode and DataNode configuration. However, you need to avoid setting it
in the Accumulo configuration... instead, use whatever configuration Accumulo
uses to set this value. This means that you can't use the same config file for
the NN and DN as for the DFSClient, currently.

bq. If span receiver initialization in DFSClient is important to the use of the
hadoop.htrace.sampler configuration property, perhaps a compromise would be to
perform SpanReceiverHost.getInstance only when the sampler is set to something
other than NeverSampler.

Keep in mind that {{hadoop.htrace.sampler}} is a completely different
configuration key than {{hadoop.htrace.span.receiver.classes}}. If you are
sampling at the level of Accumulo operations, I would not recommend setting
{{hadoop.htrace.sampler}}, in any config file on the cluster. You want all of
the sampling to happen inside accumulo.

bq. I think Billie Rinaldi is correct here; the client should not instantiate
it's own SpanReceiverHost, but instead depend on the process in which it
resides to provide. This is how HBase client works as well.

HBase is exactly the same. In the case of HBase, you do not want to set
{{hadoop.htrace.span.receiver.classes}} in the HBase config files. Instead,
you would set {{hbase.htrace.span.receiver.classes}}. Then HBase would create
a span receiver, and DFSClient would not.

It seems like there is a hidden assumption here that you want to use the same
config file for everything. But we really don't support that right now.

Getting rid of the SpanReceiverHost in DFSClient is not an option since some
people want to just trace HDFS without tracing any other system. Plus, it just
kicks the problem up to a higher level. If my FooProcess wants to use both
HTrace and Accumulo, FooProcess could easily make the same argument that
Accumulo should not instantiate SpanReceiverHost since FooProcess is already
doing that. And since FooProcess uses the accumulo client, it would conflict
with whatever accumulo was configuring, if the same config file was used for
everything.

One thing we could do to make this a little less painful is to deduplicate span
receivers inside the library. So if both DFSClient and Accumlo requested an
HTracedSpanReceiver, we could simply create one instance of that. This would
allow us to use the same config file for everything.

As a side note, [~billie.rinaldi], can you explain how you configure which
sampler and span receiver accumulo uses? In HBase we set it to
{{hbase.htrace.span.receiver.classes}}, etc. I would recommend something like
{{accumulo.htrace.span.receiver.classes}} for consistency. This also allows
you to sue the same config file for everything since it doesn't conflict with
the keys which Hadoop uses to set these values. That is the reason why we set
up the hbase.htrace namespace as separate from the hadoop.htrace
namespace if you see what I'm saying.

DFSClient should not instantiate SpanReceiverHost
-

Key: HDFS-8213
URL: https://issues.apache.org/jira/browse/HDFS-8213
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Assignee: Brahma Reddy Battula
Priority: Critical

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-22 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507888#comment-14507888
]

Colin Patrick McCabe commented on HDFS-8213:

[~ndimiduk], what if I want to trace an HBase PUT all the way through the
system? You're saying that the HBase client can't activate tracing on its own,
so I have to make code changes to the process doing the PUT (i.e. the user of
the HBase client) in order to get that info? Seems like a limitation.

It's also worth pointing out that adding a {{SpanReceiverHost}} to the
{{DFSClient}} is not really a new change... it goes back to HDFS-7055, last
October. So it's been in there at least 6 months. Of course we can revisit it
if that makes sense, but it's not really new except in the sense that it took
a very long time to do another Hadoop release with that in it. (We really
should start being better with releases...)

Thinking about this a little more, another possible resolution here is to
change the configuration keys which the DFSClient looks for so that it's
different than the ones which the NameNode and DataNode look for. Right now
{{hadoop.htrace.spanreceiver.classes}} will activate span receivers in both the
NN and the DFSClient. But the DFSClient could instead look for
{{hdfs.client.htrace.spanreceiver.classes}}. Then [~billie.rinaldi] could use
the same configuration file for everything, and the dfsclient would never
create its own span receivers or samplers. And I could continue to trace the
dfsclient without modifying daemon code. Seems like a good resolution.

DFSClient should not instantiate SpanReceiverHost
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-22 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508049#comment-14508049
]

Colin Patrick McCabe commented on HDFS-8213:

bq. Yes, clients need tracing, and when they do they should enable it
themselves. FsShell should enable tracing when it wants to use it, instead of
doing that in DFSClient.

There are hundreds or maybe even thousands of programs that use the HDFS
client. It's not practical to modify them all to run
{{Trace#addSpanReceiver}}. In some cases the programs that use HDFS are even
proprietary or customer programs where we don't have access to the source code.

I have some ideas for how to make this all work better with a better interface
in {{Tracer}}. We might need an incompatible interface change to do it,
though. For now, let's just change the config key for DFSClient... that should
fix the problem for Accumulo.

DFSClient should not instantiate SpanReceiverHost
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-22 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507819#comment-14507819
]

Colin Patrick McCabe commented on HDFS-8213:

bq. The hadoop.htrace.span.receiver.classes is not set in Accumulo
configuration files, but it is set in Hadoop configuration files. Accumulo uses
Hadoop configuration files to connect to HDFS, thus its uses of DFSClient will
have Hadoop's hadoop.htrace.span.receiver.classes. HBase does something
similar, I believe.

The way Cloudera Manager manages configuration files is that it creates
separate config files for each daemon. So the NameNode reads its own set of
config files, the DataNode reads a separate set, Hive reads another set, Flume
reads still another set, etc. etc. So {{hadoop.htrace.span.receiver.classes}}
would be set in the NN and DN configuration files, but not in the ones
targetted towards the DFSClients. Does Ambari do something similar? It seems
like using the same set of configuration files for everything would be very
limiting, if you wanted to do something like turn on short circuit for some
clients but not for others, etc.

I know from a developer perspective it's frustrating to not be able to use the
same config files for every daemon (I like to do that myself) but it's not
broken, just inconvenient.

bq. No. The way it works (did work, until this change was introduced in
DFSClient) is that server processes instantiate SpanReceiverHost. If an app
wants tracing, it also has to instantiate SpanReceiverHost. The Accumulo client
does not instantiate SPH itself, as DFSClient should not.

It's not true that only server processes need tracing. Clients also need
tracing. For example, one test I do a lot is to run FsShell with tracing
turned on. This would not be possible if only servers had tracing. The point
that I was making with my example is that the Accumulo client itself probably
should have tracing too, and this would potentially conflict with another
server using the Accumulo client.

bq. The change in DFSClient changes how apps are supposed to use tracing. It
seems like this would be mitigated by deduping SpanReceivers in htrace, but if
we go that route I would like the DFSClient change to be reverted until HDFS
moves to a version of htrace with deduping. Otherwise, Accumulo and HBase will
have to leave HDFS tracing disabled, or change how they're configuring HDFS, if
they wish to avoid double delivery of spans.

We're doing a new release of HTrace soon... like this week or the next. If we
can get the deduping into the 3.2 release, we can bump the version in Hadoop
2.7.1. We can't change what's in Hadoop 2.7.0, that release is done.

Thanks again for trying this stuff out. I'm going to work on a deduping patch
for HTrace, would appreciate a review.

DFSClient should not instantiate SpanReceiverHost
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8133) Improve readability of deleted block check

2015-04-21 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505479#comment-14505479
 ] 

Colin Patrick McCabe commented on HDFS-8133:


+1.

Thanks, Daryn.

Test failures are unrelated.  I ran the tests locally and they passed.

 Improve readability of deleted block check
 --

 Key: HDFS-8133
 URL: https://issues.apache.org/jira/browse/HDFS-8133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-8133.patch


 The current means of checking if a block is deleted is checking if its block 
 collection is null.  A more readable approach is an isDeleted method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8133) Improve readability of deleted block check

2015-04-21 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8133:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

 Improve readability of deleted block check
 --

 Key: HDFS-8133
 URL: https://issues.apache.org/jira/browse/HDFS-8133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 2.8.0

 Attachments: HDFS-8133.patch


 The current means of checking if a block is deleted is checking if its block 
 collection is null.  A more readable approach is an isDeleted method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-21 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505797#comment-14505797
 ] 

Colin Patrick McCabe commented on HDFS-8213:


Hi Billie,

{{DFSClient}} needs to instantiate {{SpanReceiverHost}} in order to implement 
tracing, in the case where the process using the {{DFSClient}} doesn't 
configure its own span receivers.

If you are concerned about multiple span receivers being instantiated, simply 
set {{hadoop.htrace.span.receiver.classes}} to the empty string, and Hadoop 
won't instantiate any span receivers.  That should be its default anyway.

 DFSClient should not instantiate SpanReceiverHost
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-17 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500965#comment-14500965
 ] 

Colin Patrick McCabe commented on HDFS-8070:


bq. For this patch, do I have to redeploy the HDFS DN to test?

Yes, this is a datanode-side fix.

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.7.0
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-8070.001.patch


 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at

[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-16 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498553#comment-14498553
 ] 

Colin Patrick McCabe commented on HDFS-8113:


There are already a bunch of places in the code where we check whether 
BlockCollection is null before doing something with it.  Example:
{code}
if (block instanceof BlockInfoContiguous) {
  BlockCollection bc = ((BlockInfoContiguous) block).getBlockCollection();
  String fileName = (bc == null) ? [orphaned] : bc.getName();
  out.print(fileName + : );
}
{code}

also:
{code}
  private int getReplication(Block block) {
final BlockCollection bc = blocksMap.getBlockCollection(block);
return bc == null? 0: bc.getBlockReplication();
  }
{code}

I think that the majority of cases already have a check.  My suggestion is just 
that we extend this checking against null to all uses of the 
BlockInfoContiguous structure's block collection.

If the problem is too difficult to reproduce with a {{MiniDFSCluster}}, perhaps 
we can just do a unit test of the copy constructor itself.

As I said earlier, I don't understand the rationale for keeping blocks with no 
associated INode out of the BlocksMap.  It complicates the block report since 
it requires us to check whether each block has an associated inode or not 
before adding it to the BlocksMap.  But if that change seems too ambitious for 
this JIRA, we can deal with that later.

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned

2015-04-16 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498556#comment-14498556
]

Colin Patrick McCabe commented on HDFS-7993:

bq, Maybe we can change the description from repl to live repl? It will address
the confusion others might have.

Can we do that in a separate JIRA? Since it's an incompatible change we might
want to do it only in Hadoop 3.0. There are a lot of people parsing fsck
output (unfortunately).

The rest looks good, if we can keep the existing output the same I would love
to add the replicaDetails option.

Incorrect descriptions in fsck when nodes are decommissioned

Key: HDFS-7993
URL: https://issues.apache.org/jira/browse/HDFS-7993
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch,
HDFS-7993.4.patch

When you run fsck with -files or -racks, you will get something like
below if one of the replicas is decommissioned.
{noformat}
blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
{noformat}
That is because in NamenodeFsck, the repl count comes from live replicas
count; while the actual nodes come from LocatedBlock which include
decommissioned nodes.
Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement
verifies LocatedBlock that includes decommissioned nodes. However, it seems
better to exclude the decommissioned nodes in the verification; just like how
fsck excludes decommissioned nodes when it check for under replicated blocks.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned

2015-04-16 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498556#comment-14498556
]

Colin Patrick McCabe edited comment on HDFS-7993 at 4/16/15 7:28 PM:
-

bq. Maybe we can change the description from repl to live repl? It will address
the confusion others might have.

Can we do that in a separate JIRA? Since it's an incompatible change we might
want to do it only in Hadoop 3.0. There are a lot of people parsing fsck
output (unfortunately).

The rest looks good, if we can keep the existing output the same I would love
to add the replicaDetails option.

was (Author: cmccabe):
bq, Maybe we can change the description from repl to live repl? It will address
the confusion others might have.

Can we do that in a separate JIRA? Since it's an incompatible change we might
want to do it only in Hadoop 3.0. There are a lot of people parsing fsck
output (unfortunately).

The rest looks good, if we can keep the existing output the same I would love
to add the replicaDetails option.

Incorrect descriptions in fsck when nodes are decommissioned

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-16 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8070:
---
Summary: Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 
DataNode  (was: ShortCircuitShmManager goes into dead mode, stopping all 
operations)

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.8.0
Reporter: Gopal V
Assignee: Kihwal Lee

 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at

[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-16 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8070:
---
Attachment: HDFS-8070.001.patch

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.7.0
Reporter: Gopal V
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8070.001.patch


 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at

[jira] [Assigned] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-16 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-8070:
--

Assignee: Colin Patrick McCabe  (was: Kihwal Lee)

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.8.0
Reporter: Gopal V
Assignee: Colin Patrick McCabe

 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at

[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-16 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8070:
---
Status: Patch Available  (was: Open)

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.7.0
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-8070.001.patch


 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at

[jira] [Updated] (HDFS-8070) Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode

2015-04-16 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8070:
---
 Priority: Blocker  (was: Major)
Affects Version/s: (was: 2.8.0)
   2.7.0

 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.7.0
Reporter: Gopal V
Assignee: Colin Patrick McCabe
Priority: Blocker
 Attachments: HDFS-8070.001.patch


 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at

[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-15 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496485#comment-14496485
 ] 

Colin Patrick McCabe commented on HDFS-8113:


Thanks for the explanation, guys.  I wasn't aware of the invariant that 
{{BlockInfoContiguous}} structures with {{bc == null}} were not in the 
{{BlocksMap}}.  I think we should remove this invariant, and instead simply 
have the {{BlocksMap}} contain all the blocks.  The memory savings from keeping 
them out is trivial, since the number of blocks without associated inodes 
should be very small.  I think we can just check whether the INode field is 
null when appropriate.  That seems to be the direction that the patch here is 
taking, and I think it makes sense.

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8070) ShortCircuitShmManager goes into dead mode, stopping all operations

2015-04-15 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497287#comment-14497287
 ] 

Colin Patrick McCabe commented on HDFS-8070:


Both Hadoop 2.7 and the Haodop 2 branch (which is what I assume you mean by 
Hadoop 2.8) have HDFS-7915.  So I think there should not be any compatibility 
issues on that front.

Can you check whether the patch up at HADOOP-11802 solves your issue?  At very 
least, it should get you a more informative exception.

 ShortCircuitShmManager goes into dead mode, stopping all operations
 ---

 Key: HDFS-8070
 URL: https://issues.apache.org/jira/browse/HDFS-8070
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.8.0
Reporter: Gopal V
Assignee: Kihwal Lee

 HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded 
 split-generation.
 I hit this immediately after I upgraded the data, so I wonder if the 
 ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 
 Client?
 {code}
 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=2, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] 
 shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, 
 parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got 
 IOException calling shutdown(SHUT_RDWR)
 java.nio.channels.ClosedChannelException
   at 
 org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
   at 
 org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
   at 
 org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC 
 pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk)
 expr = (not leaf-0)
 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] 
 shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to 
 release short-circuit shared memory slot Slot(slotIdx=4, 
 shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending 
 ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket.  
 Closing shared memory segment.
 java.io.IOException: ERROR_INVALID: there is no shared memory segment 
 registered with shmId a86ee34576d93c4964005d90b0d97c38
   at 
 org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
   at

[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-14 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494761#comment-14494761
 ] 

Colin Patrick McCabe commented on HDFS-8088:


also I am at a conference now, so I apologize if my replies are slow!

 Reduce the number of HTrace spans generated by HDFS reads
 -

 Key: HDFS-8088
 URL: https://issues.apache.org/jira/browse/HDFS-8088
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8088.001.patch


 HDFS generates too many trace spans on read right now.  Every call to read() 
 we make generates its own span, which is not very practical for things like 
 HBase or Accumulo that do many such reads as part of a single operation.  
 Instead of tracing every call to read(), we should only trace the cases where 
 we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-14 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494760#comment-14494760
 ] 

Colin Patrick McCabe commented on HDFS-8088:


bq. I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and 
things are much happier. The performance is much closer to what I previously 
saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc)

Thanks, Josh.  I discovered that we are reading non-trivial amounts of remote 
data inside the {{DFSInputStream#blockSeekTo}} method, so I think we'll also 
need to create a trace span for that one.  Also, the {{BlockReader}} trace 
scopes will need to use the {{DFSClient#traceSampler}} (currently they don't) 
or else we will never get any trace spans from reads.  I think that is what we 
would need to get the patch on this JIRA committed.

bq. Giving a very quick look at the code (and making what's possible a bad 
guess), perhaps all of the 0ms length spans (denoted by zeroCount in the above, 
as opposed to the nonzeroCount) are when DFSOutputStream#writeChunk is only 
appending data into the current packet and not actually submitting that packet 
for the data streamer to process? With some more investigation into the 
hierarchy, I bet I could definitively determine that.

Keep in mind that doing a write in HDFS just hands the data off to a background 
thread called {{DataStreamer}}. which writes it out asynchronously.  The only 
reason why {{writeChunk}} would ever have a time much higher than 0 is that 
there was lock contention (the {{DataStreamer#waitAndQueuePacket}} method 
couldn't get the {{DataStreamer#dataQueue}} lock immediately) or that there 
were more than {{dfs.client.write.max-packets-in-flight}} unacked messages in 
flight already.  (HDFS calls messages by the name of packets even though 
each message is typically multiple ethernet packets.)

I guess we have to step back and ask what the end goal is for HTrace.  If the 
end goal is figuring out why some requests had a high latency, it makes sense 
to only trace parts of the program that we think will take a non-trivial amount 
of time.  In that case, we should probably only trace the handoff of the full 
packet to the {{DataStreamer}}.  If the end goal is understanding the 
downstream consequences of all operations, then we have to connect up the dots 
for all operations.  That's why I originally had all calls to write() and 
read() create trace spans.

I'm inclined to lean more towards goal #1 (figure out why specific requests had 
high latency) than goal #2.  I think that looking at the high-latency outliers 
will naturally lead us to fix the biggest performance issues (such as locking 
contention, disk issues, network issues, etc.).  Also, if all calls to write() 
and read() create trace spans, then this will have a multiplicative effect on 
our top-level sampling rate which I think is undesirable.

bq. That being said, I hope I'm not being too much of a bother with all this. I 
was just really excited to see this functionality in HDFS and want to make 
we're getting good data coming back out. Thanks for bearing with me and for the 
patches you've already made!

We definitely appreciate all the input.  I think it's very helpful.  I do think 
maybe we should target 2.7.1 for some of these changes since I need to think 
through everything.  I know that's frustrating, but hopefully if we maintain a 
reasonable Hadoop release cadence it won't be too bad.  I'd also like to run 
some patches by you guys to see if it improves the usefulness of HTrace to you. 
 And I am doing a bunch of testing internally which I think will turn up a lot 
more potential improvements to HTrace and to its integration into HDFS.  
Use-cases really should be very helpful in motivating us here.

 Reduce the number of HTrace spans generated by HDFS reads
 -

 Key: HDFS-8088
 URL: https://issues.apache.org/jira/browse/HDFS-8088
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8088.001.patch


 HDFS generates too many trace spans on read right now.  Every call to read() 
 we make generates its own span, which is not very practical for things like 
 HBase or Accumulo that do many such reads as part of a single operation.  
 Instead of tracing every call to read(), we should only trace the cases where 
 we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-13 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492577#comment-14492577
 ] 

Colin Patrick McCabe commented on HDFS-8113:


It seems like BlockCollection will be null if the block doesn't belong to any 
file.  We should also have a unit test for this.  I was thinking:

1. start a mini dfs cluster with 2 datanodes
2. create a file with repl=2 and close it
3. take down one DN
4. delete the file
5. wait
6. bring back up the other DN, which will still have the block from the file 
which was deleted

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking

2015-04-13 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492587#comment-14492587
 ] 

Colin Patrick McCabe commented on HDFS-6919:


+1 for the idea

 Enforce a single limit for RAM disk usage and replicas cached via locking
 -

 Key: HDFS-6919
 URL: https://issues.apache.org/jira/browse/HDFS-6919
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 The DataNode can have a single limit for memory usage which applies to both 
 replicas cached via CCM and replicas on RAM disk.
 See comments 
 [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
  
 [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
  and 
 [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
  for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-04-13 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492596#comment-14492596
 ] 

Colin Patrick McCabe commented on HDFS-7878:


Symlinks and directories both have a unique file ID just the same as files.  
Maybe inodeID is a better name than fileID?

Typically, you never actually get a FileInfo for a symlink itself unless you 
call getFileLinkInfo.  If you simply call getFileInfo, you get the FileInfo for 
the file the symlink points to, not for the symlink itself.

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8063) Fix intermittent test failures in TestTracing

2015-04-09 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8063:
---
Summary: Fix intermittent test failures in TestTracing  (was: Fix test 
failure in TestTracing)

 Fix intermittent test failures in TestTracing
 -

 Key: HDFS-8063
 URL: https://issues.apache.org/jira/browse/HDFS-8063
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: HDFS-8063.001.patch, HDFS-8063.002.patch, 
 testReadTraceHooks.html


 Tests in TestTracing sometimes fails, especially on slow machine. The cause 
 is that spans is possible to arrive at receiver after 
 {{assertSpanNamesFound}} passed and 
 {{SetSpanReceiver.SetHolder.spans.clear()}} is called for next test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7188) support build libhdfs3 on windows

2015-04-09 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-7188.

   Resolution: Fixed
Fix Version/s: HDFS-6994

committed to HDFS-6994

 support build libhdfs3 on windows
 -

 Key: HDFS-7188
 URL: https://issues.apache.org/jira/browse/HDFS-7188
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
 Environment: Windows System, Visual Studio 2010
Reporter: Zhanwei Wang
Assignee: Thanh Do
 Fix For: HDFS-6994

 Attachments: HDFS-7188-branch-HDFS-6994-0.patch, 
 HDFS-7188-branch-HDFS-6994-1.patch, HDFS-7188-branch-HDFS-6994-2.patch, 
 HDFS-7188-branch-HDFS-6994-3.patch


 libhdfs3 should work on windows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 4942 matches

Mail list logo