[jira] [Commented] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-25 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436339#comment-15436339
 ] 

Charles Lamb commented on HDFS-4210:


bq. I am taking over the jira in order to push it over the finish line. Hope 
that is ok with you.

No problem [~jzhuge]. Thanks for taking over.


> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch
>
>
> Setting  : 
>   qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   cdh4master01 and cdh4master02 JournalNode up and running, 
>   cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
>   12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93)
>   ... 10 more
> I suggest that if quorum is up format should not fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.004.patch

[~cmccabe], I've rebased the patch.

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 HDFS-7847.004.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: (was: HDFS-7847.004.patch)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 HDFS-7847.005.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-05-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.005.patch

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 HDFS-7847.005.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8292) Move conditional in fmt_time from dfs-dust.js to status.html

2015-04-30 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522637#comment-14522637
 ] 

Charles Lamb commented on HDFS-8292:


Yes, I tested it manually.

Thanks Andrew.


 Move conditional in fmt_time from dfs-dust.js to status.html
 

 Key: HDFS-8292
 URL: https://issues.apache.org/jira/browse/HDFS-8292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.8.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8292.000.patch


 Per [~wheat9]'s comment in HDFS-8214, move the check for  0 from dfs-dust.js 
 to status.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8292) Move conditional in fmt_time from dfs-dust.js to status.html

2015-04-30 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8292:
---
Attachment: HDFS-8292.000.patch

[~andrew.wang],

The attached patch moves the check for  0 from fmt_time to status.html. Please 
take a look when you get a chance.

Thanks.


 Move conditional in fmt_time from dfs-dust.js to status.html
 

 Key: HDFS-8292
 URL: https://issues.apache.org/jira/browse/HDFS-8292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.8.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-8292.000.patch


 Per [~wheat9]'s comment in HDFS-8214, move the check for  0 from dfs-dust.js 
 to status.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8292) Move conditional in fmt_time from dfs-dust.js to status.html

2015-04-30 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8292:
---
Status: Patch Available  (was: Open)

 Move conditional in fmt_time from dfs-dust.js to status.html
 

 Key: HDFS-8292
 URL: https://issues.apache.org/jira/browse/HDFS-8292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.8.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-8292.000.patch


 Per [~wheat9]'s comment in HDFS-8214, move the check for  0 from dfs-dust.js 
 to status.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Status: Patch Available  (was: Reopened)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb reopened HDFS-7847:


Porting to trunk. .004 submitted.

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519304#comment-14519304
 ] 

Charles Lamb commented on HDFS-8214:


The test failure is unrelated. The checkstyle issue has already been discussed 
above.


 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Target Version/s: 2.8.0  (was: HDFS-7836)
  Status: In Progress  (was: Patch Available)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7923:
---
Attachment: HDFS-7923.002.patch

Thanks for the review and comments [~cmccabe].

{code}
  public static final String  DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = 
dfs.namenode.max.concurrent.block.reports;
  public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT 
= Integer.MAX_VALUE;
{code}

bq. It seems like this should default to something less than the default number 
of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 
10, it seems like this should be no more than 5 or 6, right? The main point 
here to avoid having the NN handler threads completely choked with block 
reports, and that is defeated if the value is MAX_INT. I realize that you 
probably intended this to be configured. But it seems like we should have a 
reasonable default that works for most people.

Actually, my intent was to not have this feature kick in unless it was 
configured, but you have said that you want it enabled by default. I've changed 
the default to the above setting to 6.

{code}
+  /* Number of block reports currently being processed. */
+  private final AtomicInteger blockReportProcessingCount = new 
AtomicInteger(0);
{code}

bq. I'm not sure an AtomicInteger makes sense here. We only modify this 
variable (write to it) when holding the FSN lock in write mode, right? And we 
only read from it when holding the FSN in read mode. So, there isn't any need 
to add atomic ops.

Actually, it is incr'd outside the FSN lock, otherwise it could never be  1.

bq. I think we need to track which datanodes we gave the green light to, and 
not decrement the counter until they either send that report, or some timeout 
expires. (We need the timeout in case datanodes go away after requesting 
permission-to-send.) The timeout can probably be as short as a few minutes. If 
you can't manage to send an FBR in a few minutes, there's more problems going 
on.

I've added a map called 'pendingBlockReports' to BlockManager to track the 
datanodes that we've given the ok to as well as when we gave it to them. 
There's also a method to clean the table.

{code}
  public static final String  DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = 
dfs.blockreport.max.deferMsec;
  public static final longDFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = 
Long.MAX_VALUE;
{code}

bq. Do we really need this config key?

I've added a TreeBidiMap called lastBlockReportTime to track this. I would have 
used guava instead of apache.commons.collections, but Guava doesn't have a 
sorted BidiMap.


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8292) Move conditional in fmt_time from dfs-dust.js to status.html

2015-04-29 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-8292:
--

 Summary: Move conditional in fmt_time from dfs-dust.js to 
status.html
 Key: HDFS-8292
 URL: https://issues.apache.org/jira/browse/HDFS-8292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.8.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


Per [~wheat9]'s comment in HDFS-8214, move the check for  0 from dfs-dust.js 
to status.html.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520613#comment-14520613
 ] 

Charles Lamb commented on HDFS-8214:


I created HDFS-8292 for this.

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520603#comment-14520603
 ] 

Charles Lamb commented on HDFS-8214:


[~wheat9],

I'll make the change in a followup-jira. Thanks for the review.


 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-28 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.004.patch

.004 is rebased onto trunk.

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-27 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8214:
---
Attachment: HDFS-8214.003.patch

[~andrew.wang],

Thanks for the review. The .003 patch makes your suggested changes. I also 
added a  0 check to dfs-dust.js. Perhaps it should just return  instead of 
unknown.


 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-27 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514367#comment-14514367
 ] 

Charles Lamb commented on HDFS-8214:


The test failure is spurious. I ran the failed test (TestDiskspaceQuotaUpdate) 
and it passed on my machine.

The checkstyle warning is

{quote}
error line=56 column=3 severity=error message=Redundant 
apos;publicapos; modifier. 
source=com.puppycrawl.tools.checkstyle.checks.modifier.RedundantModifierCheck/
{quote}

This is because I added the new getLastCheckpointDeltaMs() method. It is 
complaining about public being redundant. I could remove it, but keeping it 
there is maintain the existing style of other getters.


 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-24 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8214:
---
Attachment: HDFS-8214.001.patch

[~yzhang], could you please take a look at this?

We could either try to have Last Checkpoint: be a relative time (e.g. 26 secs 
ago as the current SecondaryNameNode#toString does, or we could have it be a 
wallclock time. Unfortunately, having it be a relative time means that the JS 
for secondary/status.html would have to compute that in the brower client's 
local TZ which is problematic. In fact, Start Time is already in wallclock time 
so it feels best to mimic that. This does, however, mean that the #toString 
method has to be changed (back) to have Last Checkpoint be a wallclock time 
rather than the relative time that HDFS-5591 changed it to be.

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-24 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8214:
---
Status: Patch Available  (was: Open)

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-24 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8214:
---
Attachment: HDFS-8214.002.patch

[~andrew.wang],

Thanks for the review. Please check out the attached.

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-24 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7923:
---
Attachment: HDFS-7923.001.patch

[~cmccabe], attached is a patch that is rebased onto the trunk.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-24 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511161#comment-14511161
 ] 

Charles Lamb commented on HDFS-8214:


No test is needed since it's just a change to a display message.

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-21 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-8214:
--

 Summary: Secondary NN Web UI shows wrong date for Last Checkpoint
 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb


SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
the web UI. This causes weird times, generally, just after the epoch, to be 
displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8099) Change DFSInputStream has been closed already message to debug log level

2015-04-09 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487805#comment-14487805
 ] 

Charles Lamb commented on HDFS-8099:


The test failure is unrelated. No new tests were included since it just a 
change from INFO to DEBUG and I manually tested it with the CLI.


 Change DFSInputStream has been closed already message to debug log level
 --

 Key: HDFS-8099
 URL: https://issues.apache.org/jira/browse/HDFS-8099
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-8099.000.patch, HDFS-8099.001.patch


 The hadoop fs -get command always shows this warning:
 {noformat}
 $ hadoop fs -get /data/schemas/sfdc/BusinessHours-2014-12-09.avsc
 15/04/06 06:22:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
 {noformat}
 This was introduced by HDFS-7494. The easiest thing is to just remove the 
 warning from the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-09 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487558#comment-14487558
 ] 

Charles Lamb commented on HDFS-7240:


[~jnp] et al,

This is very interesting. Thanks for posting it.

Is the 1KB key size limit a hard limit or just a design/implementation target? 
There will be users who want keys that can be arbitrarily large (e.g. 10's to 
100's of KB). So although it may be acceptable to degrade above 1KB, I don't 
think you want to make it a hard limit. You could argue that they could just 
hash their keys, or that they could have some sort of key map, but then it 
would be hard to do secondary indices in the future.

The details of partitions are kind of lacking beyond the second to last 
paragraph on page 4. Are partitions and storage containers 1:1? (A storage 
container can contain a maximum of one partition...). Obviously a storage 
container holds more than just a partition. Perhaps a little more detail about 
partitions and how they are located, etc. is warranted.

In the call flow diagram on page 6, it looks like there's a lot going on in 
terms of network traffic. There's the initial REST call, then an RPC to get the 
bucket metadata, then one to read the bucket metadata, then another to get the 
object's container location, then back to the client who gets redirected. 
That's a lot of REST/RPCs just to get to the actual data. Will any of this be 
cached, perhaps in the Ozone Handler or maybe even on the client (I realize 
that's a bit hard with a REST based protocol). For instance, if it were 
possible to cache some of the hash in the client, then that would cut some RPCs 
to the Ozone Handler. If the cache were out of date, then the actual call to 
the data (step (9) in the diagram) could be rejected, the cache invalidated, 
and the entire call sequence (1) - (8) could be executed to get the right 
location.

IWBNI there was some description of the protocols used between all these moving 
parts. I know that it's REST from client to Ozone Handler, but what about the 
other network calls in the diagram? Will it be more REST, or Hadoop RPC, or 
something else? You talk about security at the end so I guess the 
authentication will be Kerberos based? Or will you allow more authentication 
options such as those that HDFS currently has?

Hash partitioning can also suffer from hotspots depending on the semantics of 
the key. That's not to say that it's the wrong decision to use it, only that it 
can have similar drawbacks as key partitioning. Since it looks like you have 
two separate hashes, one for buckets, and then one for the object key within 
the bucket, it is possible that there could be hotspots based on a particular 
bucket. Presumably some sort of caching would help here since the bucket 
mapping is relatively immutable.

Secondary indexing will not be easy in a distributed sharded system, especially 
the consistency issues in dealing with updates. That said, I am reasonably 
certain that you will find that many users will need this feature relatively 
soon such that it is high on the roadmap.

You don't say much about ACLs other than to include them in the REST API. I 
suppose they'll be implemented in the Ozone Handler, but what will they look 
like? HDFS/Linux ACLs?

In the Cluster Level APIs, presumably DELETE Storage Volume is only allowed by 
the admin. What about GET?

How are quotas enabled and set? I don't see it in the API anywhere. There's 
mention early on that they're set up by the administrator. Perhaps it's via 
some http jsp thing to the Ozone Handler or Storage Container Manager? Who 
enforces them?

no guarantees on partially written objects - Does this also mean that there 
are no block-order guarantees during write? Are holey objects allowed or will 
the only inconsistencies be at the tail of an object. This is obviously 
important for log-based storage systems.

In the Size requirements section on page 3 you say Number of objects per 
bucket: 1 million, and then later on you say A bucket can have millions of 
objects. You may want to shore that up a little.

Also in the Size requirements section you say Object Size: 5G, but then later 
it says The storage container needs to store object data that can vary from a 
few hundred KB to hundreds of megabytes. I'm not sure those are necessarily 
inconsistent, but I'm also not sure how to reconcile them.

Perhaps you could include a diagram showing how an object maps to partitions 
and storage containers and then onto DNs. In other words, a general diagram 
showing all the various storage concepts (objects, partitions, storage 
containers, hash tables, transactions, etc.)

We plan to re-use Namenode's block management implementation for container 
management, as much as possible. I'd love to see more detail on what can be 
reused, what high level changes to the BlkMgr code will be needed, what 

[jira] [Created] (HDFS-8099) Remove extraneous warning from DFSInputStream.close()

2015-04-08 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-8099:
--

 Summary: Remove extraneous warning from DFSInputStream.close()
 Key: HDFS-8099
 URL: https://issues.apache.org/jira/browse/HDFS-8099
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


The hadoop fs -get command always shows this warning:

{noformat}
$ hadoop fs -get /data/schemas/sfdc/BusinessHours-2014-12-09.avsc
15/04/06 06:22:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
{noformat}

This was introduced by HDFS-7494. The easiest thing is to just remove the 
warning from the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8099) Remove extraneous warning from DFSInputStream.close()

2015-04-08 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8099:
---
Attachment: HDFS-8099.000.patch

 Remove extraneous warning from DFSInputStream.close()
 -

 Key: HDFS-8099
 URL: https://issues.apache.org/jira/browse/HDFS-8099
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-8099.000.patch


 The hadoop fs -get command always shows this warning:
 {noformat}
 $ hadoop fs -get /data/schemas/sfdc/BusinessHours-2014-12-09.avsc
 15/04/06 06:22:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
 {noformat}
 This was introduced by HDFS-7494. The easiest thing is to just remove the 
 warning from the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8099) Remove extraneous warning from DFSInputStream.close()

2015-04-08 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8099:
---
Status: Patch Available  (was: Open)

 Remove extraneous warning from DFSInputStream.close()
 -

 Key: HDFS-8099
 URL: https://issues.apache.org/jira/browse/HDFS-8099
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-8099.000.patch


 The hadoop fs -get command always shows this warning:
 {noformat}
 $ hadoop fs -get /data/schemas/sfdc/BusinessHours-2014-12-09.avsc
 15/04/06 06:22:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
 {noformat}
 This was introduced by HDFS-7494. The easiest thing is to just remove the 
 warning from the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8099) Remove extraneous warning from DFSInputStream.close()

2015-04-08 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-8099:
---
Attachment: HDFS-8099.001.patch

[~cmccabe], good idea. New patch attached.

Tested manually:

{code}
[cwl@localhost hadoop]$ rm hosts;hadoop fs -get /hosts
[cwl@localhost hadoop]$ 
{code}

 Remove extraneous warning from DFSInputStream.close()
 -

 Key: HDFS-8099
 URL: https://issues.apache.org/jira/browse/HDFS-8099
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-8099.000.patch, HDFS-8099.001.patch


 The hadoop fs -get command always shows this warning:
 {noformat}
 $ hadoop fs -get /data/schemas/sfdc/BusinessHours-2014-12-09.avsc
 15/04/06 06:22:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
 {noformat}
 This was introduced by HDFS-7494. The easiest thing is to just remove the 
 warning from the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-06 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481715#comment-14481715
 ] 

Charles Lamb commented on HDFS-7923:


Here is a description of the heuristic that my patch has implemented for the NN 
to determine what to send back in response to the should I send a BR? 
question. In the vein of keeping it relatively simple, let's consider 3 
parameters:


*   The max # of FBR requests that the NN is willing to process at any given 
time (to be called 'dfs.namenode.max.concurrent.block.reports', with a default 
of Integer.MAX_INTEGER)
*   The DN's configured block report interval (dfs.blockreport.intervalMsec). 
This parameter already exists.
*   The max time we ever want the NN to go without receiving an FBR from a 
given DN ('dfs.blockreport.max.deferMsec').

If the time since the last FBR received from the DN is less than 
dfs.blockreport.intervalMsec, then it returns false (No, don't send an FBR). 
In theory, this should never happen if the DN is obeying 
dfs.blockreport.intervalMsec.

If the number of block reports currently being processed by an NN is less than 
dfs.namenode.max.concurrent.block.reports, and the time since it last received 
an FBR from the DN sending the heartbeat is greater than 
dfs.blockreport.intervalMsec, then the NN automatically answers true (Yes, 
send along an FBR).

If the number of BRs being processed by an NN is  than 
dfs.namenode.max.concurrent.block.reports when it receives the heartbeat, then 
it checks the last time that it received an FBR from the DN sending the 
heartbeat and if it's greater than dfs.blockreport.max.deferMsec, then it 
returns true (Yes, send along an FBR). If the time-since-last-FBR is less 
than dfs.blockreport.max.deferMsec, then it returns false.


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7923.000.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-06 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7923:
---
Attachment: HDFS-7923.000.patch

Attached is a patch that implements the behavior I described.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7923.000.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-06 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7923 started by Charles Lamb.
--
 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7923.000.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8040) Able to move encryption zone to Trash

2015-04-01 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb resolved HDFS-8040.

Resolution: Not a Problem

 Able to move encryption zone to Trash
 -

 Key: HDFS-8040
 URL: https://issues.apache.org/jira/browse/HDFS-8040
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Sumana Sathish

 Users can remove encryption directory using the FsShell remove commands 
 without -skipTrash option.
 {code}
 /usr/hdp/current/hadoop-hdfs-client/bin/hdfs dfs -D fs.trash.interval=60 
 -rm -r /user/hrt_qa/encryptionZone_1
 2015-04-01 
 19:19:46,510|beaver.machine|INFO|654|140309507495680|MainThread|15/04/01 
 19:19:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion 
 interval = 360 minutes, Emptier interval = 0 minutes.
 2015-04-01 
 19:19:46,534|beaver.machine|INFO|654|140309507495680|MainThread|Moved: 
 'hdfs://sumana-dal-secure-4.novalocal:8020/user/hrt_qa/encryptionZone_1' to 
 trash at: hdfs://sumana-dal-secure-4.novalocal:8020/user/hrt_qa/.Trash/Current
 2015-04-01 
 19:19:46,863|test_TDE_trash|INFO|654|140309507495680|MainThread|Checking if 
 the encryption zone is in Trash or not
 2015-04-01 
 19:19:46,864|beaver.machine|INFO|654|140309507495680|MainThread|RUNNING: 
 /usr/hdp/current/hadoop-client/bin/hadoop dfs -ls -R 
 /user/hrt_qa/.Trash/Current
 2015-04-01 
 19:19:46,892|beaver.machine|INFO|654|140309507495680|MainThread|DEPRECATED: 
 Use of this script to execute hdfs command is deprecated.
 2015-04-01 
 19:19:46,893|beaver.machine|INFO|654|140309507495680|MainThread|Instead use 
 the hdfs command for it.
 2015-04-01 19:19:46,893|beaver.machine|INFO|654|140309507495680|MainThread|
 2015-04-01 
 19:19:50,289|beaver.machine|INFO|654|140309507495680|MainThread|drwx--   
 - hrt_qa hrt_qa  0 2015-04-01 19:19 /user/hrt_qa/.Trash/Current/user
 2015-04-01 
 19:19:50,292|beaver.machine|INFO|654|140309507495680|MainThread|drwx--   
 - hrt_qa hrt_qa  0 2015-04-01 19:19 
 /user/hrt_qa/.Trash/Current/user/hrt_qa
 2015-04-01 
 19:19:50,296|beaver.machine|INFO|654|140309507495680|MainThread|drwxr-xr-x   
 - hrt_qa hrt_qa  0 2015-04-01 19:19  
 /user/hrt_qa/.Trash/Current/user/hrt_qa/encryptionZone_1 
 2015-04-01 
 19:19:50,326|beaver.machine|INFO|654|140309507495680|MainThread|-rw-r--r--   
 3 hrt_qa hrt_qa   3273 2015-04-01 19:19 
 /user/hrt_qa/.Trash/Current/user/hrt_qa/encryptionZone_1/file_to_get.txt
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8040) Able to move encryption zone to Trash

2015-04-01 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391411#comment-14391411
 ] 

Charles Lamb commented on HDFS-8040:


Hi [~ssath...@hortonworks.com],

I tried reproducing this:

{code}
[cwl@localhost hadoop]$ hdfs crypto -listZones
/ez  mykey 
[cwl@localhost hadoop]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - cwl supergroup  0 2015-04-01 15:41 /ez
[cwl@localhost hadoop]$ hdfs dfs -ls /ez
Found 1 items
-rw-r--r--   3 cwl supergroup158 2015-04-01 15:41 /ez/hosts
[cwl@localhost hadoop]$ hdfs dfs -D fs.trash.interval=60 -rm -r /ez
15/04/01 16:41:15 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
Deletion interval = 60 minutes, Emptier interval = 0 minutes.
rm: Failed to move to trash: hdfs://localhost/ez: /ez can't be moved from an 
encryption zone.
[cwl@localhost hadoop]$ hdfs dfs -ls -R /
drwxr-xr-x   - cwl supergroup  0 2015-04-01 15:41 /ez
-rw-r--r--   3 cwl supergroup158 2015-04-01 15:41 /ez/hosts
drwx--   - cwl supergroup  0 2015-04-01 16:41 /user
drwx--   - cwl supergroup  0 2015-04-01 16:41 /user/cwl
drwx--   - cwl supergroup  0 2015-04-01 16:41 /user/cwl/.Trash
drwx--   - cwl supergroup  0 2015-04-01 16:41 
/user/cwl/.Trash/Current
[cwl@localhost hadoop]$ hdfs dfs -ls -R /user/cwl/.Trash
drwx--   - cwl supergroup  0 2015-04-01 16:41 
/user/cwl/.Trash/Current
[cwl@localhost hadoop]$ hdfs dfs -ls -R /user/cwl/.Trash/Current
{code}

Do you see any difference between what you did and what I did?


 Able to move encryption zone to Trash
 -

 Key: HDFS-8040
 URL: https://issues.apache.org/jira/browse/HDFS-8040
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: sumana sathish

 Users can remove encryption directory using the FsShell remove commands 
 without -skipTrash option.
 {code}
 /usr/hdp/current/hadoop-hdfs-client/bin/hdfs dfs -D fs.trash.interval=60 
 -rm -r /user/hrt_qa/encryptionZone_1
 2015-04-01 
 19:19:46,510|beaver.machine|INFO|654|140309507495680|MainThread|15/04/01 
 19:19:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion 
 interval = 360 minutes, Emptier interval = 0 minutes.
 2015-04-01 
 19:19:46,534|beaver.machine|INFO|654|140309507495680|MainThread|Moved: 
 'hdfs://sumana-dal-secure-4.novalocal:8020/user/hrt_qa/encryptionZone_1' to 
 trash at: hdfs://sumana-dal-secure-4.novalocal:8020/user/hrt_qa/.Trash/Current
 2015-04-01 
 19:19:46,863|test_TDE_trash|INFO|654|140309507495680|MainThread|Checking if 
 the encryption zone is in Trash or not
 2015-04-01 
 19:19:46,864|beaver.machine|INFO|654|140309507495680|MainThread|RUNNING: 
 /usr/hdp/current/hadoop-client/bin/hadoop dfs -ls -R 
 /user/hrt_qa/.Trash/Current
 2015-04-01 
 19:19:46,892|beaver.machine|INFO|654|140309507495680|MainThread|DEPRECATED: 
 Use of this script to execute hdfs command is deprecated.
 2015-04-01 
 19:19:46,893|beaver.machine|INFO|654|140309507495680|MainThread|Instead use 
 the hdfs command for it.
 2015-04-01 19:19:46,893|beaver.machine|INFO|654|140309507495680|MainThread|
 2015-04-01 
 19:19:50,289|beaver.machine|INFO|654|140309507495680|MainThread|drwx--   
 - hrt_qa hrt_qa  0 2015-04-01 19:19 /user/hrt_qa/.Trash/Current/user
 2015-04-01 
 19:19:50,292|beaver.machine|INFO|654|140309507495680|MainThread|drwx--   
 - hrt_qa hrt_qa  0 2015-04-01 19:19 
 /user/hrt_qa/.Trash/Current/user/hrt_qa
 2015-04-01 
 19:19:50,296|beaver.machine|INFO|654|140309507495680|MainThread|drwxr-xr-x   
 - hrt_qa hrt_qa  0 2015-04-01 19:19  
 /user/hrt_qa/.Trash/Current/user/hrt_qa/encryptionZone_1 
 2015-04-01 
 19:19:50,326|beaver.machine|INFO|654|140309507495680|MainThread|-rw-r--r--   
 3 hrt_qa hrt_qa   3273 2015-04-01 19:19 
 /user/hrt_qa/.Trash/Current/user/hrt_qa/encryptionZone_1/file_to_get.txt
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8040) Able to move encryption zone to Trash

2015-04-01 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391656#comment-14391656
 ] 

Charles Lamb commented on HDFS-8040:


[~xyao], [~ssath...@hortonworks.com],

This is actually correct behavior. If you have an EZ rooted at 
/user/hrt_qa/encryptionZone_1, it is ok to be able to move around an entire ez 
to another directory, in this case /user/hrt_qa/encryptionZone_1. That's what 
HDFS-7530 fixed. Hence, the -rm -r command is effectively a rename of 
/user/hrt_qa/encryptionZone_1 to /user/hrt_qa/.Trash/Current. Since you're 
picking up the entire EZ, that's allowed.

Does this make sense?


 Able to move encryption zone to Trash
 -

 Key: HDFS-8040
 URL: https://issues.apache.org/jira/browse/HDFS-8040
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: sumana sathish

 Users can remove encryption directory using the FsShell remove commands 
 without -skipTrash option.
 {code}
 /usr/hdp/current/hadoop-hdfs-client/bin/hdfs dfs -D fs.trash.interval=60 
 -rm -r /user/hrt_qa/encryptionZone_1
 2015-04-01 
 19:19:46,510|beaver.machine|INFO|654|140309507495680|MainThread|15/04/01 
 19:19:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion 
 interval = 360 minutes, Emptier interval = 0 minutes.
 2015-04-01 
 19:19:46,534|beaver.machine|INFO|654|140309507495680|MainThread|Moved: 
 'hdfs://sumana-dal-secure-4.novalocal:8020/user/hrt_qa/encryptionZone_1' to 
 trash at: hdfs://sumana-dal-secure-4.novalocal:8020/user/hrt_qa/.Trash/Current
 2015-04-01 
 19:19:46,863|test_TDE_trash|INFO|654|140309507495680|MainThread|Checking if 
 the encryption zone is in Trash or not
 2015-04-01 
 19:19:46,864|beaver.machine|INFO|654|140309507495680|MainThread|RUNNING: 
 /usr/hdp/current/hadoop-client/bin/hadoop dfs -ls -R 
 /user/hrt_qa/.Trash/Current
 2015-04-01 
 19:19:46,892|beaver.machine|INFO|654|140309507495680|MainThread|DEPRECATED: 
 Use of this script to execute hdfs command is deprecated.
 2015-04-01 
 19:19:46,893|beaver.machine|INFO|654|140309507495680|MainThread|Instead use 
 the hdfs command for it.
 2015-04-01 19:19:46,893|beaver.machine|INFO|654|140309507495680|MainThread|
 2015-04-01 
 19:19:50,289|beaver.machine|INFO|654|140309507495680|MainThread|drwx--   
 - hrt_qa hrt_qa  0 2015-04-01 19:19 /user/hrt_qa/.Trash/Current/user
 2015-04-01 
 19:19:50,292|beaver.machine|INFO|654|140309507495680|MainThread|drwx--   
 - hrt_qa hrt_qa  0 2015-04-01 19:19 
 /user/hrt_qa/.Trash/Current/user/hrt_qa
 2015-04-01 
 19:19:50,296|beaver.machine|INFO|654|140309507495680|MainThread|drwxr-xr-x   
 - hrt_qa hrt_qa  0 2015-04-01 19:19  
 /user/hrt_qa/.Trash/Current/user/hrt_qa/encryptionZone_1 
 2015-04-01 
 19:19:50,326|beaver.machine|INFO|654|140309507495680|MainThread|-rw-r--r--   
 3 hrt_qa hrt_qa   3273 2015-04-01 19:19 
 /user/hrt_qa/.Trash/Current/user/hrt_qa/encryptionZone_1/file_to_get.txt
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2015-03-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376532#comment-14376532
 ] 

Charles Lamb commented on HDFS-6658:


[~daryn],

I spent a couple of hours making a first pass through the patch.

* The BlockReplicaId encodings seem sufficiently large for the foreseeable 
future.
* As you point out in the .jpg of your whiteboard, +1 on getting rid of the 
triplets.
* You solve the issue of sparse block ids by converting them to scalars and 
maintaining the skipBitSet.

Why did you roll your own bitset instead of using the Java bitset?

I'd like to hear more about concurrency in the overall data structure since 
that's a problem that [~cmccabe] and I are trying to tackle. Would you be able 
to have a phone conversation on Thursday or Friday this week to discuss it?


 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Daryn Sharp
 Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
 redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode 
 Memory Optimizations - Block replicas list.docx, New primative indexes.jpg, 
 Old triplets.jpg


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-20 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb resolved HDFS-7847.

   Resolution: Fixed
Fix Version/s: HDFS-7836

Committed to HDFS-7836 branch.

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-17 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.003.patch

Thanks for the review [~cmccabe]. I moved those two methods over to 
DFSTestUtils.java in .003.


 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.002.patch

@cmccabe, @stack, thanks for the review!

bq. DFSClient.java: this change adds three new fields to DFSClient. But they 
only seem to be used by unit tests. It seems like we should just put these 
inside the unit test(s) that are using these-- if necessary, by adding a helper 
method. There's no reason to add more fields to DFSClient. Also remember that 
when using FileContext, we create new DFSClients all the time.

Good point. I've left the existing {code}ClientProtocol namenode{code} field 
alone. The other 3 proxies are created on-demand by their getters. That means 
no change in DFSClient instance size.

bq. It seems kind of odd to have NameNodeProxies#createProxy create a proxy to 
the datanode.

It's actually a proxy to the NN for the DatanodeProtocol. That's the same 
protocol that the DN uses to speak with the NN when it's sending (among other 
things) block reports. But with some ideas from @stack, I got rid of the 
changes to NameNodeProxies.

bq. Of course the NameNode may or may not be remote here. It seems like --nnuri 
or just --namenode or something like that would be more descriptive.

Yeah, I agree. I changed it to -namenode.

bq. Instead of this boilerplate, just use StringUtils#popOptionWithArgument.

Changed. I was just trying to match the existing code, but the using 
StringUtils is for the better. 

{code}
-  replication, BLOCK_SIZE, null);
+  replication, BLOCK_SIZE, CryptoProtocolVersion.supported());
{code}

bq. This fix is a little bit separate, right? I suppose we can do it in this 
JIRA, though.

Without this, the relevant PBHelper.convert code throws NPE on the 
supportVersions arg.


 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures

2015-03-12 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355210#comment-14355210
 ] 

Charles Lamb commented on HDFS-7846:


Colin, this looks pretty good. A few questions and comments.

Yi mentioned unused imports, but there are also unnecessary 
java.lang.{String,ClassCastException} imports.

BlockId.equals: constructing a ClassCastException, and especially the resulting 
call to fillInStackTrace, is an expensive way of checking the type. I would 
think instanceof is preferred.

Are you planning on doing something with Shard.name in the future?

The indentation of the assignment to htable is off a bit.

Jenkins will ask you this question, but why no unit tests?


 Create off-heap BlocksMap and BlockData structures
 --

 Key: HDFS-7846
 URL: https://issues.apache.org/jira/browse/HDFS-7846
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7846-scl.001.patch


 Create off-heap BlocksMap, BlockInfo, and DataNodeInfo structures.  The 
 BlocksMap will use the off-heap hash table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures

2015-03-12 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355217#comment-14355217
 ] 

Charles Lamb commented on HDFS-7846:


Oh, I forgot to mention there are three places where git apply flags the patch 
for adding trailing whitespace.

 Create off-heap BlocksMap and BlockData structures
 --

 Key: HDFS-7846
 URL: https://issues.apache.org/jira/browse/HDFS-7846
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7846-scl.001.patch


 Create off-heap BlocksMap, BlockInfo, and DataNodeInfo structures.  The 
 BlocksMap will use the off-heap hash table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2015-03-12 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357672#comment-14357672
 ] 

Charles Lamb commented on HDFS-6658:


Hi [~daryn],

Colin and I read over the design doc. I confess that I still need to read over 
the patch, but I will do that. Do you think it will be possible to create a 
safe mode to run this in so that inconsistencies can be detected? I'm also 
wondering what the field widths are, but I can find those when I read the patch.



 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Daryn Sharp
 Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
 redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode 
 Memory Optimizations - Block replicas list.docx, New primative indexes.jpg, 
 Old triplets.jpg


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-11 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.001.patch

@cmccabe, @stack, thanks for the review!

bq. DFSClient.java: this change adds three new fields to DFSClient. But they 
only seem to be used by unit tests. It seems like we should just put these 
inside the unit test(s) that are using these-- if necessary, by adding a helper 
method. There's no reason to add more fields to DFSClient. Also remember that 
when using FileContext, we create new DFSClients all the time.

Good point. I've left the existing {code}ClientProtocol namenode{code} field 
alone. The other 3 proxies are created on-demand by their getters. That means 
no change in DFSClient instance size.

bq. It seems kind of odd to have NameNodeProxies#createProxy create a proxy to 
the datanode.

It's actually a proxy to the NN for the DatanodeProtocol. That's the same 
protocol that the DN uses to speak with the NN when it's sending (among other 
things) block reports.

bq. In general, when you see NameNodeProxies I think proxies used by the 
NameNode and this doesn't fit with that.

These are actually proxies used to talk to the NN, not proxies used by the NN. 
I didn't make the name.

bq. Can you give a little more context about why this is a good idea (as 
opposed to just having some custom code in the unit test or in a unit test util 
class that creates a proxy)

While the name DatanodeProtocol makes us think of an RPC protocol to the 
datanode, it is in fact yet another one of the many protocols to the namenode 
which is embodied in the NamenodeProtocols (plural) omnibus interface. The 
problem this is addressing is that when we are talking to an in-process NN in 
the NNThroughputBenchmark, then it's easy to get our hands on a 
NamenodeProtocols instance -- you simply call NameNode.getRpcServer(). However, 
the idea of this patch is to let you run the benchmark against a non-in-process 
NN, so there's no NameNode instance to use. That means we have to create RPC 
proxy objects for each of the NN protocols that we need to use.

It would be nice if we could create a single proxy for the omnibus 
NamenodeProtocols interface, but we can't. Instead, we have to pick and choose 
the different namenode protocols that we want to use -- ClientProtocol, 
NamenodeProtocol, RefreshUserMappingProtocol, and DatanodeProtocol -- and 
create proxies for them. Code to create proxies for the first three of these 
already existed in NameNodeProxies.java, but we have to add a few new lines to 
create the DatanodeProtocol proxy.

@stack I looked into your (offline) suggestion to try calling through the 
TinyDatanode, but it's just doing the same thing that my patch does -- it uses 
the same ClientProtocol instance that the rest of the test uses. TinyDataNode 
is really just a skeleton and doesn't really borrow much code from the real DN.

bq. Of course the NameNode may or may not be remote here. It seems like --nnuri 
or just --namenode or something like that would be more descriptive.

Yeah, I agree. I changed it to -namenode.

bq. Instead of this boilerplate, just use StringUtils#popOptionWithArgument.

Changed. I was just trying to match the existing code, but the using 
StringUtils is for the better. 

{code}
-  replication, BLOCK_SIZE, null);
+  replication, BLOCK_SIZE, CryptoProtocolVersion.supported());
{code}

bq. This fix is a little bit separate, right? I suppose we can do it in this 
JIRA, though.

Without this, the relevant PBHelper.convert code throws NPE on the 
supportVersions arg.


 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-03-09 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353386#comment-14353386
 ] 

Charles Lamb commented on HDFS-7836:


JOIN WEBEX MEETING
https://cloudera.webex.com/join/clamb  |  622 867 972


JOIN BY PHONE
1-650-479-3208 Call-in toll number (US/Canada)
Access code: 622 867 972

Global call-in numbers:
https://cloudera.webex.com/cloudera/globalcallin.php?serviceType=MCED=342142257tollFree=0
https://cloudera.webex.com/cloudera/globalcallin.php?serviceType=MCED=342142257tollFree=0


Can't join the meeting? Contact support here:
https://cloudera.webex.com/mc

IMPORTANT NOTICE: Please note that this WebEx service allows audio and other 
information sent during the session to be recorded, which may be discoverable 
in a legal matter. By joining this session, you automatically consent to such 
recordings. If you do not consent to being recorded, discuss your concerns with 
the host or do not join the session.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-03-09 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353455#comment-14353455
 ] 

Charles Lamb commented on HDFS-7836:


If you are planning on attending the meeting in-person, please drop me an email 
so I have an idea of how large a CR to book.

Thanks.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-09 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Description: Modify NNThroughputBenchmark to be able to operate on a NN 
that is not in process. A followon Jira will modify it some more to allow 
quantifying native and java heap sizes, and some latency numbers.  (was: Write 
a junit test to simulate a heavy BlockManager load.  Quantify native and java 
heap sizes, and some latency numbers.)
Summary: Modify NNThroughputBenchmark to be able to operate on a remote 
NameNode  (was: Write a junit test to simulate a heavy BlockManager load)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7910) Modify NNThroughputBenchmark to be able to provide some metrics

2015-03-09 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7910:
--

 Summary: Modify NNThroughputBenchmark to be able to provide some 
metrics
 Key: HDFS-7910
 URL: https://issues.apache.org/jira/browse/HDFS-7910
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb


Modify NNThroughputBenchmark to quantify native and java heap sizes, as well as 
some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7847) Write a junit test to simulate a heavy BlockManager load

2015-03-09 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353850#comment-14353850
 ] 

Charles Lamb commented on HDFS-7847:


Thanks [~aagarwal]. I'll take a look at them.


 Write a junit test to simulate a heavy BlockManager load
 

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: make_blocks.tar.gz


 Write a junit test to simulate a heavy BlockManager load.  Quantify native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-09 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.000.patch

Add a new -remoteNamenode option to the CLI which takes a URI of a remote NN.

The existing NNThroughputBenchmark uses the umbrella NamenodeProtocols (plural) 
interface, but you can only create proxies for the underlying RPC interfaces. 
This separates all the calls made in NNThroughputBenchmark out into the smaller 
sub-interfaces.

Modify DFSClient so that proxies for each of the required interfaces can be 
created.

Minor typo fixes encountered along the way.

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7847.000.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-09 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7847 started by Charles Lamb.
--
 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7847.000.patch, make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7844) Create an off-heap hash table implementation

2015-03-06 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351163#comment-14351163
 ] 

Charles Lamb commented on HDFS-7844:


I applied your latest patch and set breakpoints at all of the exceptional 
throws in ByteArrayMemoryManager.java. Then I ran the unit test. The following 
lines did not trigger:

91, 94, 117, 129, 135, 165, 171, 190, 203, 245, 251.

I think those are the exceptions in allocate, free, one of the ones in 
putShort, and all of the throws in the getters.


 Create an off-heap hash table implementation
 

 Key: HDFS-7844
 URL: https://issues.apache.org/jira/browse/HDFS-7844
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7844-scl.001.patch, HDFS-7844-scl.002.patch, 
 HDFS-7844-scl.003.patch


 Create an off-heap hash table implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7844) Create an off-heap hash table implementation

2015-03-06 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351073#comment-14351073
 ] 

Charles Lamb commented on HDFS-7844:


[~cmccabe],

This is a nice piece of work!

Here are some comments:

General:

Several lines bust the 80 char limit.

Many unused imports throughout. I guess Yi got this already.

What happens if someone runs this with the -d32 to the jvm? Do we need to make 
that check and throw accordingly?

ProbingHashSet.java:

A small enhancement might be: {code}close(boolean force){code} which will close 
unconditionally.

The line in #getSlot which is {code}hash = -hash{code} is in fact tested by 
your unit tests, but I don't think it's tested by design in the test. You might 
want to put in an explicit test for that particular line.

#expandTable: using {code}catch(Throwable){code} feels like a rather wide net 
to cast, but I guess it's the right thing. I debated whether all you needed was 
catch (Error), but I guess you can't be sure that the callers above you won't 
just keep going after some RuntimeException gets into their hands.

The comment for #capacity() total number of slots is either misleading or 
wrong.

MemoryManager.java

any reason not to have get/putShort along with the existing byte/int/long?

Should #toString() be declared as {code}@Override public String toString(){code}

NativeMemoryManager.java

The comments say nothing about whether it's thread safe or not. Ditto for 
ByteArrayMemoryManager.

ByteArrayMemoryManager

There is no test coverage for the failure case of {code}BAMM.close(){code}

s/valiation/validation/ (Yi caught this)

Why does curAddress start at 1000?

s/2^^31/2^31/

For all of the put/get/byte/int/long routines, it wouldn't be hard to move all 
of the {code}if() { throw new RuntimeException }{code} snippits into their own 
routine. Maybe that's not worth the trouble, but if feels like there's a lot of 
repeated code.

TestMemoryManager.java

The indentation of #testMemoryManagerCreate formals is messed up.

#testCatchInvalidPuts: you test putByte against freed memory, but not int or 
long.

the Assert.fail messages should be different for each fail() call.

The exception checks in getByte/Int/Long are not tested.

None of the entry==null exceptions are tested in putByte/Long/Int

I tried running  TestMemoryManager.testNativeMemoryManagerCreate and it failed 
like this:

{code}
2015-03-06 17:10:22,430 ERROR offheap.MemoryManager$Factory 
(MemoryManager.java:create(91)) - Unable to create 
org.apache.hadoop.util.offheap.NativeMemoryManager.  Falling back on 
org.apache.hadoop.util.offheap.ByteArrayMemoryManager
java.lang.IllegalArgumentException: wrong number of arguments
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.util.offheap.MemoryManager$Factory.create(MemoryManager.java:89)
at 
org.apache.hadoop.util.offheap.TestMemoryManager.testMemoryManagerCreate(TestMemoryManager.java:135)
at 
org.apache.hadoop.util.offheap.TestMemoryManager.testNativeMemoryManagerCreate(TestMemoryManager.java:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

org.junit.ComparisonFailure: 
Expected :org.apache.hadoop.util.offheap.NativeMemoryManager
Actual   :org.apache.hadoop.util.offheap.ByteArrayMemoryManager
 Click to see difference
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.util.offheap.TestMemoryManager.testMemoryManagerCreate(TestMemoryManager.java:137)
at 
org.apache.hadoop.util.offheap.TestMemoryManager.testNativeMemoryManagerCreate(TestMemoryManager.java:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

[jira] [Commented] (HDFS-7844) Create an off-heap hash table implementation

2015-03-06 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351282#comment-14351282
 ] 

Charles Lamb commented on HDFS-7844:


Thanks Colin, +1, I'll file a follow up jira for the coverage.

 Create an off-heap hash table implementation
 

 Key: HDFS-7844
 URL: https://issues.apache.org/jira/browse/HDFS-7844
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7844-scl.001.patch, HDFS-7844-scl.002.patch, 
 HDFS-7844-scl.003.patch


 Create an off-heap hash table implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-03-05 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349438#comment-14349438
 ] 

Charles Lamb commented on HDFS-7836:


We'll hold a design review meeting and discussion of this project next Weds, 
March 11th, 10am to 1pm (PDT) at the Cloudera offices in Palo Alto. I'll post 
webex information on this Jira before then. If you plan on attending in person, 
please send me a private email so I know how many people to expect.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-03-03 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345012#comment-14345012
 ] 

Charles Lamb commented on HDFS-7836:


Yes, there would definitely be a webex available.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-03 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345311#comment-14345311
 ] 

Charles Lamb commented on HDFS-7435:


Hi @daryn,

The new patch looks pretty good to me. Just a few nits.

FsDatasetImpl still has one line that exceeds the 80 chars, and there are a 
couple of unused imports in the new test TestBlockListAsLongs. Also in that 
test, IWBNI you could use the specific Mockito static imports that are needed 
rather than the * import.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7845) Compress block reports

2015-03-03 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345372#comment-14345372
 ] 

Charles Lamb commented on HDFS-7845:


bq. Charles Lamb did some tests with a block report and got around 50% (if I'm 
remembering correctly?) Charles Lamb, can you comment on whether those tests 
were done with vints or regular integers?

Yes, 50% is about what I saw. Those were done on the array of longs, not vints, 
using plain lz4.

 Compress block reports
 --

 Key: HDFS-7845
 URL: https://issues.apache.org/jira/browse/HDFS-7845
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb

 We should optionally compress block reports using a low-cpu codec such as lz4 
 or snappy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338394#comment-14338394
 ] 

Charles Lamb commented on HDFS-7836:


Hi [~arpit99],

Thanks for reading over the design doc and commenting on it.

bq. The DataNode can now split block reports per storage directory (post 
HDFS-2832), controlled by DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY. Did you get a 
chance to try it out and see if it helps? Splitting reports addresses all of 
the above. (edit: does not address network bandwidth gains from compression 
though)

I think you may mean your work on HDFS-5153, right? If I understand that 
correctly, it sends one report per storage. We have seen block reports in the 
100MB+ sizes so we suspect that an even small chunksize than a storage may 
yield benefits. That said, I am also watching [~daryn]'s work on HDFS-7435 
which addresses a lot of this piece of this Jira's proposal. I think that once 
HDFS-7435 is committed, we will make some measurements and see if anything else 
in the area of chunking is necessary. As you point out, compression should also 
help.

bq. Do you have any estimates for startup time overhead due to GCs?

We know of at least one large deployment which experiences a full GC pause 
during startup. I'm not sure of the time, but in general, the off-heaping will 
help with NN throughput just by reducing the number of objects on the heap.

bq. How does this affect block report processing? We cannot assume DataNodes 
will sort blocks by target stripe. Will the NameNode sort received reports or 
will it acquire+release a lock per block? If the former, then there should 
probably be some randomization of order across threads to avoid unintended 
serialization e.g. lock convoys.

The idea is that currently, processing a block report requires taking the FSN 
lock. So this proposal is two part. First, use better locking semantics so that 
we don't have to take the FSN lock. Next, shard the blocksMap structure so that 
multiple threads can operate concurrently on that structure. Even if we 
continue to process BRs under one big happy FSN lock, having multiple threads 
operate concurrently will yield benefits. The sharding (stripes) is along 
arbitrary boundaries. For instance, the design doc suggests that it could be 
striped by doing blockId % nStripes. nStripes would be configurable to a 
relatively small number (the dd suggests 4 to 16), and if the modulo 
calculation is used, then nStripes would be a prime that is roughly equal to 
the number of threads available. As long as block report processing per block 
does not need to access more than one shard at a time, this will be fine -- 
multiple threads can process blocks in parallel. It is a technique that 
Berkeley DB Java Edition uses for its lock table to improve concurrency.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338395#comment-14338395
 ] 

Charles Lamb commented on HDFS-7836:


Hi [~arpit99],

Thanks for reading over the design doc and commenting on it.

bq. The DataNode can now split block reports per storage directory (post 
HDFS-2832), controlled by DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY. Did you get a 
chance to try it out and see if it helps? Splitting reports addresses all of 
the above. (edit: does not address network bandwidth gains from compression 
though)

I think you may mean your work on HDFS-5153, right? If I understand that 
correctly, it sends one report per storage. We have seen block reports in the 
100MB+ sizes so we suspect that an even small chunksize than a storage may 
yield benefits. That said, I am also watching [~daryn]'s work on HDFS-7435 
which addresses a lot of this piece of this Jira's proposal. I think that once 
HDFS-7435 is committed, we will make some measurements and see if anything else 
in the area of chunking is necessary. As you point out, compression should also 
help.

bq. Do you have any estimates for startup time overhead due to GCs?

We know of at least one large deployment which experiences a full GC pause 
during startup. I'm not sure of the time, but in general, the off-heaping will 
help with NN throughput just by reducing the number of objects on the heap.

bq. How does this affect block report processing? We cannot assume DataNodes 
will sort blocks by target stripe. Will the NameNode sort received reports or 
will it acquire+release a lock per block? If the former, then there should 
probably be some randomization of order across threads to avoid unintended 
serialization e.g. lock convoys.

The idea is that currently, processing a block report requires taking the FSN 
lock. So this proposal is two part. First, use better locking semantics so that 
we don't have to take the FSN lock. Next, shard the blocksMap structure so that 
multiple threads can operate concurrently on that structure. Even if we 
continue to process BRs under one big happy FSN lock, having multiple threads 
operate concurrently will yield benefits. The sharding (stripes) is along 
arbitrary boundaries. For instance, the design doc suggests that it could be 
striped by doing blockId % nStripes. nStripes would be configurable to a 
relatively small number (the dd suggests 4 to 16), and if the modulo 
calculation is used, then nStripes would be a prime that is roughly equal to 
the number of threads available. As long as block report processing per block 
does not need to access more than one shard at a time, this will be fine -- 
multiple threads can process blocks in parallel. It is a technique that 
Berkeley DB Java Edition uses for its lock table to improve concurrency.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338768#comment-14338768
 ] 

Charles Lamb commented on HDFS-7435:


@daryn,

This looks really good. I like the new approach and your current patch does a 
pre-emptive strike on several of the comments I was going to make on the .002 
patch.

I really only have nits.

The patch needs to be rebased. There was one .rej when I applied it (obviously 
I worked past that for my review).

BlockListAsLongs.java:

BlockListAsLongs(Collection) needs an @param for the javadoc.

In #BlockListAsLongs(Collection), the ReplicaState is being written as a 
varint64. I realize it's a varint, but since it's really only a single byte in 
the implementation, it seems a little heavy handed to write it to the cos as a 
varint64. I also realize that it will need to be a long on the way back out for 
the uc long[]. If you don't want to change it from being a varint64 in the cos, 
then perhaps just add a comment saying that you know it's a byte (actually int) 
in the impl but for consistency you're using a varint64?

Since you throw UnsupportedOperationException from multiple #remove methods, 
you might want to add the class name to the message. e.g. Sorry. remove not 
implemented for BlockReportListIterator. Along a similar vein, would it be 
appropriate to add a message to BlockReportReplica.getVisibleLength, 
getStorageUuid, and isOnTransientStorage's UnsupportedOperationException?

BlockReportTestBase.java:

getBlockReports has one line that exceeds the 80 char width.

DatanodeProtocolClientSideTranslatorPB.java:

the import of NameNodeLayoutVersion is unused.

the decl of useBlocksBuffer busts the 80 char width.

DatanodeProtocolServerSideTranslatorPB.java:

import ByteString is unused.

FsDatasetImpl.java:

in #getBlockReports, the line under case RUR busts the 80 char limit.

NameNodeLayoutVersion.java:

Perhaps s/Protobuf optimized/Optimized protobuf/

NNThroughputBenchmark.java:

Thanks for fixing the formatting in here.

TestBlockHasMultipleReplicasOnSameDN.java:

blocks.add(... busts the 80 char limit.



 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338901#comment-14338901
 ] 

Charles Lamb commented on HDFS-7836:


bq. Are you proposing that off-heaping is an opt-in feature that must be 
explicitly enabled in configuration, or are you proposing that off-heaping will 
be the new default behavior? Arguably, jumping to off-heaping as the default 
could be seen as a backwards-incompatibility, because it might be unsafe to 
deploy the feature without simultaneous down-tuning the NameNode max heap size. 
Some might see that as backwards-incompatible with existing configurations.

The proposal is to have an option that lets the offheap code allocate slabs 
using 'new byte[]' rather than malloc. This would be used for debugging 
purposes and not in a normal deployment.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339390#comment-14339390
 ] 

Charles Lamb commented on HDFS-7435:


bq. I mildly disagree will overly verbose message for 
UnsupportedOperationExceptions since the JDK rarely uses message and the class 
 method is in the stack trace.

Yeah, I figured there was a reason for not having a message in the UOE, so it 
won't cause me any heartburn if you don't put them in.

bq. I think the whole fragmented over-the-write buffer implementation is over 
engineered.

Oh, I kind of liked it.

I'll try to take a look at the new patch soon. Thanks for the reply.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339385#comment-14339385
 ] 

Charles Lamb commented on HDFS-7836:


bq. it would be useful to see some perf comparison before we add that 
complexity.

We definitely plan on getting some baseline measurements and sharing them. We 
definitely want to know what the before and after effects are of any changes. 
As an aside, I worked on a case where we had to increase the RPC limit to 192MB 
in order to get block reports handled correctly so I know this type of 
deployments are out there.

bq. I'll see if I can clean up and post what I used on HDFS-7847.

That would be much appreciated. I'm starting to look at HDFS-7847 (subtask of 
this Jira) and maybe that could come into play somehow.

bq. These two sound contradictory.

Yes, they do, but aren't meant to be. The first level would be to do concurrent 
processing under the FSN lock. That would at least get some parallelism. The 
second step would be to make a more lockless blocksMap which wouldn't require 
the big FSN lock to be held.

BTW, since you have edit privs, if you want to get rid of my reduntent reply to 
you above that would be great. My browser suckered me into hitting Add twice. 
Thanks.

 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-02-24 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335747#comment-14335747
 ] 

Charles Lamb commented on HDFS-7836:


Problem Statement

The number of blocks stored by the largest HDFS clusters continues to increase. 
 This increase adds pressure to the BlockManager, that part of the NameNode 
which handles block data from across the cluster.

Full block reports are problematic.  The more blocks each DataNode has, the 
longer it takes to process a full block report from that DataNode.  Storage 
densities have roughly doubled each year for the past few years.  Meanwhile, 
increases in CPU power have come mostly in the form of additional cores rather 
than faster clock speeds.  Currently, the NameNode cannot use these additional 
cores because full block reports are processed while holding the namesystem 
lock.

The BlockManager stores all blocks in memory and this contributes to a large 
heap size.  As the NameNode Java heap size has grown, full garbage collection 
events have started to take several minutes.  Although it is often possible to 
avoid full GCs by re-using Java objects, they remain an operational concern for 
administrators.  They also contribute to a long NameNode startup time, 
sometimes measured in tens of minutes for the biggest clusters.


Goals
We need to improve the BlockManager to handle the challenges of the next few 
years.  Our specific goals for this project are to:

* Reduce lock contention for the FSNamesystem lock
* Enable concurrent processing of block reports
* Reduce the Java heap size of the NameNode
* Optimize the use of network resources

[~cmccabe] and I will be working on this Jira. We propose doing this work on a 
separate branch. If there is interest in a community meeting to discuss these 
changes, then perhaps Tuesday 3/10/15 at Cloudera in Palo Alto, CA would work? 
I suggest that date because I will be in the bay area that day and would like 
to meet with other interested community members in person. I'll also be around 
3/11 and 3/12 if we need an alternate date.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb

 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7836) BlockManager Scalability Improvements

2015-02-24 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7836:
--

 Summary: BlockManager Scalability Improvements
 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Charles Lamb
Assignee: Charles Lamb


Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7836) BlockManager Scalability Improvements

2015-02-24 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7836:
---
Attachment: BlockManagerScalabilityImprovementsDesign.pdf

 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-02-11 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.003.patch

[~jingzhao],

Thanks for the comments. I think the latest patch address them by changing the 
test to a check for the src path being a snapshotted file.

Charles


 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch, HDFS-7682.003.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-02-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.002.patch

Rebased.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7704) DN heartbeat to Active NN may be blocked and expire if connection to Standby NN continues to time out.

2015-02-04 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305452#comment-14305452
 ] 

Charles Lamb commented on HDFS-7704:


Hi [~shahrs87],

I only have a few nits on the latest rev:

BPServiceActor.java:

Line continuations are 4 spaces.

At line 257 you've introduced a line containing spaces. Also, you've removed 
the last newline of the file.

BPServiceActorAction.java:

Line continuations are 4 spaces (statements in a new block are indented 2 
spaces).

ErrorReportAction.java:

s/A ErrorReportAction/An ErrorReportAction/
Can LOG be private?

ReportBadBlockAction.java

Can LOG be private?

s/to namenode :/to namenode: /
The block comment in that catch should be:

/*
 * One common reason ...
 */


 DN heartbeat to Active NN may be blocked and expire if connection to Standby 
 NN continues to time out. 
 ---

 Key: HDFS-7704
 URL: https://issues.apache.org/jira/browse/HDFS-7704
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7704-v2.patch, HDFS-7704-v3.patch, 
 HDFS-7704-v4.patch, HDFS-7704.patch


 There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks 
 and trySendErrorReport) which will wait for both of the actor threads to 
 process this calls.
 This calls are made with writeLock acquired.
 When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, 
 subsequent heartbeat response processing has to wait for the write lock. It 
 eventually gets through, but takes too long and it blocks the next heartbeat.
 In our HA cluster setup, the standby namenode was taking a long time to 
 process the request.
 Requesting improvement in datanode to make the above calls asynchronous since 
 these reports don't have any specific
 deadlines, so extra few seconds of delay should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7704) DN heartbeat to Active NN may be blocked and expire if connection to Standby NN continues to time out.

2015-02-04 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305742#comment-14305742
 ] 

Charles Lamb commented on HDFS-7704:


Oh, sorry, one more comment. In the test, to be consistent with the code that 
is already there, you can add an import static for Assert.assertTrue rather 
than importing (non-static) Assert. Or, the opposite (eliminate the import 
statics).


 DN heartbeat to Active NN may be blocked and expire if connection to Standby 
 NN continues to time out. 
 ---

 Key: HDFS-7704
 URL: https://issues.apache.org/jira/browse/HDFS-7704
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7704-v2.patch, HDFS-7704-v3.patch, 
 HDFS-7704-v4.patch, HDFS-7704.patch


 There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks 
 and trySendErrorReport) which will wait for both of the actor threads to 
 process this calls.
 This calls are made with writeLock acquired.
 When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, 
 subsequent heartbeat response processing has to wait for the write lock. It 
 eventually gets through, but takes too long and it blocks the next heartbeat.
 In our HA cluster setup, the standby namenode was taking a long time to 
 process the request.
 Requesting improvement in datanode to make the above calls asynchronous since 
 these reports don't have any specific
 deadlines, so extra few seconds of delay should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7704) DN heartbeat to Active NN may be blocked and expire if connection to Standby NN continues to time out.

2015-02-02 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301835#comment-14301835
 ] 

Charles Lamb commented on HDFS-7704:


Hi [~shahrs87],

I don't understand why you create the BPServiceActorAction class with subtypes 
(ErrorReportAction and ReportBadBlockAction) and then not do a method dispatch 
on the class. Instead you're using an enum and case to do the dispatch. This 
seems rather un-OO-like, no?

Other nits:

BPOfferService.java:

{code}
  ErrorReportAction errorReportAction = new ErrorReportAction
(BPServiceActorAction.ActionEnum.TRYSENDERRORREPORT,errCode, errMsg);
{code}

Needs a space before errCode.

BPServiceActor.java:

s/synchronized(/synchronized (/
s/switch(/switch (/

ErrorReportAction.java:

s/BPServiceActorAction{/BPServiceActorAction {/

ReportBadBlockAction.java:

s/BPServiceActorAction{/BPServiceActorAction {/

Charles


 DN heartbeat to Active NN may be blocked and expire if connection to Standby 
 NN continues to time out. 
 ---

 Key: HDFS-7704
 URL: https://issues.apache.org/jira/browse/HDFS-7704
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7704-v2.patch, HDFS-7704.patch


 There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks 
 and trySendErrorReport) which will wait for both of the actor threads to 
 process this calls.
 This calls are made with writeLock acquired.
 When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, 
 subsequent heartbeat response processing has to wait for the write lock. It 
 eventually gets through, but takes too long and it blocks the next heartbeat.
 In our HA cluster setup, the standby namenode was taking a long time to 
 process the request.
 Requesting improvement in datanode to make the above calls asynchronous since 
 these reports don't have any specific
 deadlines, so extra few seconds of delay should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-02-02 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301885#comment-14301885
 ] 

Charles Lamb commented on HDFS-7682:


[~jingzhao],

Did you have any more comments on this Jira?

BTW, the test failure is unrelated.

Thanks.
Charles


 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7423) various typos and message formatting fixes in nfs daemon and doc

2015-01-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296801#comment-14296801
 ] 

Charles Lamb commented on HDFS-7423:


bq. Is it correct?

Yes, it's ok because statistics is only declared and never used (except there). 
Hence, it's always null. Probably a better change would have been to just 
eliminate statistics completely from the file.

Thanks for the commit [~hitliuyi].


 various typos and message formatting fixes in nfs daemon and doc
 

 Key: HDFS-7423
 URL: https://issues.apache.org/jira/browse/HDFS-7423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Fix For: 2.7.0

 Attachments: HDFS-7423-branch-2.004.patch, HDFS-7423.001.patch, 
 HDFS-7423.002.patch, HDFS-7423.003.patch, HDFS-7423.004.patch


 These are accumulated fixes for log messages, formatting, typos, etc. in the 
 nfs3 daemon that I came across while working on a customer issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode

2015-01-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297657#comment-14297657
 ] 

Charles Lamb commented on HDFS-7702:


Hi [~xiyunyue],

I read over your proposal and have some high level questions.

I am unclear about your proposal in the failure scenarios. If a source or 
target NN or one or more of the DNs fails in the middle of a migration, how are 
things restarted?

Why use Kryo and not protobuf for serialization? Why use Kryo and not the 
existing Hadoop/HDFS protocols and infrastructure for network communications 
between the various nodes?

Is the transfer granularity blockpool only? I infer that from this statement:

bq. The target namenode will notify datanode remove blockpool id which belong 
to the source namenode,

but then this statement:

bq. it will mark delete the involved sub-tree from its own namespace

leads me to believe that it's sub-trees in the namespace.

Could you please clarify this statement:

bq. all read and write operation regarding the same namespace sub-tree is 
forwarding to the target namenode.

Who does the forwarding, the client or the source NN?




 Move metadata across namenode - Effort to a real distributed namenode
 -

 Key: HDFS-7702
 URL: https://issues.apache.org/jira/browse/HDFS-7702
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Ray
Assignee: Ray

 Implement a tool can show in memory namespace tree structure with 
 weight(size) and a API can move metadata across different namenode. The 
 purpose is moving data efficiently and faster, without moving blocks on 
 datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7704) DN heartbeat to Active NN may be blocked and expire if connection to Standby NN continues to time out.

2015-01-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297887#comment-14297887
 ] 

Charles Lamb commented on HDFS-7704:


Hi [~shahrs87],

A couple of quick comments:

{code}
  public void bpThreadEnqueue(DatanodeCommand datanodeCommand) {
if (bpThreadQueue != null) {
  bpThreadQueue.add(datanodeCommand);
}
  }
{code}

When would bpThreadQueue be null? Don't you want to use Preconditions here?

Several lines exceed the 80 char limit.

s/if(/if (/

I'll wait for your second version with [~kihwal]'s comments addressed.


 DN heartbeat to Active NN may be blocked and expire if connection to Standby 
 NN continues to time out. 
 ---

 Key: HDFS-7704
 URL: https://issues.apache.org/jira/browse/HDFS-7704
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7704.patch


 There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks 
 and trySendErrorReport) which will wait for both of the actor threads to 
 process this calls.
 This calls are made with writeLock acquired.
 When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, 
 subsequent heartbeat response processing has to wait for the write lock. It 
 eventually gets through, but takes too long and it blocks the next heartbeat.
 In our HA cluster setup, the standby namenode was taking a long time to 
 process the request.
 Requesting improvement in datanode to make the above calls asynchronous since 
 these reports don't have any specific
 deadlines, so extra few seconds of delay should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7423) various typos and message formatting fixes in nfs daemon and doc

2015-01-28 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7423:
---
Attachment: HDFS-7423.004.patch

Hi [~hitliuyi],

The .004 is rebased for the trunk. Let's wait for the jenkins run. Once that 
completes, I'll upload the branch-2 rebase diffs.


 various typos and message formatting fixes in nfs daemon and doc
 

 Key: HDFS-7423
 URL: https://issues.apache.org/jira/browse/HDFS-7423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7423.001.patch, HDFS-7423.002.patch, 
 HDFS-7423.003.patch, HDFS-7423.004.patch


 These are accumulated fixes for log messages, formatting, typos, etc. in the 
 nfs3 daemon that I came across while working on a customer issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7423) various typos and message formatting fixes in nfs daemon and doc

2015-01-28 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295475#comment-14295475
 ] 

Charles Lamb commented on HDFS-7423:


Test failures unrelated.

 various typos and message formatting fixes in nfs daemon and doc
 

 Key: HDFS-7423
 URL: https://issues.apache.org/jira/browse/HDFS-7423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7423-branch-2.004.patch, HDFS-7423.001.patch, 
 HDFS-7423.002.patch, HDFS-7423.003.patch, HDFS-7423.004.patch


 These are accumulated fixes for log messages, formatting, typos, etc. in the 
 nfs3 daemon that I came across while working on a customer issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7423) various typos and message formatting fixes in nfs daemon and doc

2015-01-28 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7423:
---
Attachment: HDFS-7423-branch-2.004.patch

branch-2 diffs attached.

 various typos and message formatting fixes in nfs daemon and doc
 

 Key: HDFS-7423
 URL: https://issues.apache.org/jira/browse/HDFS-7423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7423-branch-2.004.patch, HDFS-7423.001.patch, 
 HDFS-7423.002.patch, HDFS-7423.003.patch, HDFS-7423.004.patch


 These are accumulated fixes for log messages, formatting, typos, etc. in the 
 nfs3 daemon that I came across while working on a customer issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7423) various typos and message formatting fixes in nfs daemon and doc

2015-01-27 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293543#comment-14293543
 ] 

Charles Lamb commented on HDFS-7423:


Thank you for the review [~ste...@apache.org].


 various typos and message formatting fixes in nfs daemon and doc
 

 Key: HDFS-7423
 URL: https://issues.apache.org/jira/browse/HDFS-7423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7423.001.patch, HDFS-7423.002.patch, 
 HDFS-7423.003.patch


 These are accumulated fixes for log messages, formatting, typos, etc. in the 
 nfs3 daemon that I came across while working on a customer issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-6571) NameNode should delete intermediate fsimage.ckpt when checkpoint fails

2015-01-27 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb reassigned HDFS-6571:
--

Assignee: Charles Lamb

 NameNode should delete intermediate fsimage.ckpt when checkpoint fails
 --

 Key: HDFS-6571
 URL: https://issues.apache.org/jira/browse/HDFS-6571
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Charles Lamb

 When checkpoint fails in getting a new fsimage from standby NameNode or 
 SecondaryNameNode, intermediate fsimage (fsimage.ckpt_txid) is left and 
 never to be cleaned up.
 If fsimage is large and fails to checkpoint many times, the growing 
 intermediate fsimage may cause out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-27 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.001.patch

Hi [~jingzhao],

Thanks for looking at this.

isLastBlockComplete() covers the case where it's a snapshot path as well as a 
closed non-snapshot path. The file length is correct in both those cases so 
it's ok to use that. In the case of a still-being-written file, then 
isLastBlockComplete() returns false and the code works just same as it does 
today. The particular case that this patch is fixing is that a snapshotted file 
is frozen, so the file length is the limit of what should be checksummed, not 
the block lengths (which include the non-snapshotted portion). I've added more 
assertions in the test to demonstrate this.

In other words, the behavior for non-snapshotted files that are still open (and 
possibly being appended to) is not changed by this patch, only that of 
snapshotted files, for which isLastBlockComplete() is a valid check.

HDFS-5343 took a similar approach.


 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-26 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Status: Patch Available  (was: Open)

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-26 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.000.patch

Posting patch for a jenkins run.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-26 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7682:
--

 Summary: {{DistributedFileSystem#getFileChecksum}} of a 
snapshotted file includes non-snapshotted content
 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb


DistributedFileSystem#getFileChecksum of a snapshotted file includes 
non-snapshotted content.

The reason why this happens is because DistributedFileSystem#getFileChecksum 
simply calculates the checksum of all of the CRCs from the blocks in the file. 
But, in the case of a snapshotted file, we don't want to include data in the 
checksum that was appended to the last block in the file after the snapshot was 
taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7667:
---
Attachment: HDFS-7667.001.patch

[~aw],

Thanks for looking it over. The .001 version makes those two changes.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7667:
---
Attachment: HDFS-7667.000.patch

Diffs attached.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7667:
---
Status: Patch Available  (was: Open)

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7667:
--

 Summary: Various typos and improvements to HDFS Federation doc
 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290002#comment-14290002
 ] 

Charles Lamb commented on HDFS-7667:


[~aw],

Thanks for the review. I started out intending to just fix a few minor errors 
(missing articles, obviously wrong typos in commands, etc.). Then I couldn't 
help myself so I made some slightly larger grammatical changes and tightened up 
a few things. Please stop me before I kill any more and commit this.

Thanks!

Of course we still have not heard from Mr. Jenkins... I wonder where he is 
today.


 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290032#comment-14290032
 ] 

Charles Lamb commented on HDFS-7667:


Thanks for the review and the commit [~aw]. If you're bored, HDFS-7644 is a 3 
char fix.


 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7644) minor typo in HttpFS doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290048#comment-14290048
 ] 

Charles Lamb commented on HDFS-7644:


Gee, here I am fixing all these typos and I can't even get the Jira title 
correct.

Thanks for the review and the commit [~aw].


 minor typo in HttpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Fix For: 2.7.0

 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7644) minor typo in HffpFS doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289203#comment-14289203
 ] 

Charles Lamb commented on HDFS-7644:


The FB warnings are spurious.


 minor typo in HffpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7644) minor typo in HffpFS doc

2015-01-22 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7644:
---
Status: Patch Available  (was: Open)

 minor typo in HffpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6874) Add GET_BLOCK_LOCATIONS operation to HttpFS

2015-01-21 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286613#comment-14286613
 ] 

Charles Lamb commented on HDFS-6874:


[~lianggz],

Thanks for working on this.

In general the patch looks good. I have a few minor comments.

The patch on the trunk needs to be rebased. I didn't check the branch-2 patch, 
so it may need to be rebased too.

In general, lots of lines exceed the 80 char limit.

FSOperations.java

s/private static Map  blockLocationsToJSON/private static Map 
blockLocationsToJSON/
You may want to add java doc for the @param, and @return of that method.

HttpFSFileSystem.java

getFileBlockLocations should have javadoc for the @return. In this method, the 
call to HttpFSUtils.validateResponse should probably be changed to 
HttpExceptionUtils.validateResponse().

HttpFSServer.java
s/offset,len/offset, len/
Is it correct that passing a len=0 implies Long.MAX_VALUE?

JsonUtil.java
The javadoc formatting for toBlockLocations is messed up a little.
s/IOException{/IOException {/

WebHdfsFileSystem.java
for isWebHDFSJson, s/json){/json) {/ and s/m!=null/m != null/. Also, the 
javadoc needs filling in.

Charles


 Add GET_BLOCK_LOCATIONS operation to HttpFS
 ---

 Key: HDFS-6874
 URL: https://issues.apache.org/jira/browse/HDFS-6874
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Gao Zhong Liang
Assignee: Gao Zhong Liang
 Attachments: HDFS-6874-branch-2.6.0.patch, HDFS-6874.patch


 GET_BLOCK_LOCATIONS operation is missing in HttpFS, which is already 
 supported in WebHDFS.  For the request of GETFILEBLOCKLOCATIONS in 
 org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far:
 ...
  case GETFILEBLOCKLOCATIONS: {
 response = Response.status(Response.Status.BAD_REQUEST).build();
 break;
   }
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7644) minor typo in HffpFS doc

2015-01-20 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7644:
--

 Summary: minor typo in HffpFS doc
 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial


In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7644) minor typo in HffpFS doc

2015-01-20 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7644:
---
Attachment: HDFS-7644.000.patch

 minor typo in HffpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7637) Fix the check condition for reserved path

2015-01-19 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282453#comment-14282453
 ] 

Charles Lamb commented on HDFS-7637:


LGTM [~hitliuyi].

Charles


 Fix the check condition for reserved path
 -

 Key: HDFS-7637
 URL: https://issues.apache.org/jira/browse/HDFS-7637
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Attachments: HDFS-7637.001.patch


 Currently the {{.reserved}} patch check function is:
 {code}
 public static boolean isReservedName(String src) {
   return src.startsWith(DOT_RESERVED_PATH_PREFIX);
 }
 {code}
 And {{DOT_RESERVED_PATH_PREFIX}} is {{/.reserved}}, it should be 
 {{/.reserved/}}, for example: if some other directory prefix with 
 _/.reserved_, we say it's _/.reservedpath_, then the check is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7633) When Datanode has too many blocks, BlockPoolSliceScanner.getNewBlockScanTime throws IllegalArgumentException

2015-01-16 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7633:
---
Summary: When Datanode has too many blocks, 
BlockPoolSliceScanner.getNewBlockScanTime throws IllegalArgumentException  
(was: When Datanode has too many blocks, 
BlockPoolSliceScanner.getNewBlockScanTime thows IllegalArgumentException)

 When Datanode has too many blocks, BlockPoolSliceScanner.getNewBlockScanTime 
 throws IllegalArgumentException
 

 Key: HDFS-7633
 URL: https://issues.apache.org/jira/browse/HDFS-7633
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: h7633_20150116.patch


 issue:
 When Total blocks of one of my DNs reaches 33554432, It refuses to accept 
 more blocks, this is the ERROR.
 2015-01-16 15:21:44,571 | ERROR | DataXceiver for client  at /172.1.1.8:50490 
 [Receiving block 
 BP-1976278848-172.1.1.2-1419846518085:blk_1221043436_147936990] | 
 datasight-198:25009:DataXceiver error processing WRITE_BLOCK operation  src: 
 /172.1.1.8:50490 dst: /172.1.1.11:25009 | 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 java.lang.IllegalArgumentException: n must be positive
 at java.util.Random.nextInt(Random.java:300)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.getNewBlockScanTime(BlockPoolSliceScanner.java:263)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlock(BlockPoolSliceScanner.java:276)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.addBlock(DataBlockScanner.java:193)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.closeBlock(DataNode.java:1733)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:765)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 at java.lang.Thread.run(Thread.java:745)
 analysis:
 in function 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.getNewBlockScanTime()
 when blockMap.size() is too big,
 Math.max(blockMap.size(),1)  * 600  is int type, and negtive
 Math.max(blockMap.size(),1) * 600 * 1000L is long type, and negtive
 (int)period  is Integer.MIN_VALUE
 Math.abs((int)period) is Integer.MIN_VALUE , which is negtive
 DFSUtil.getRandom().nextInt(periodInt)  will thows IllegalArgumentException
 I use Java HotSpot (build 1.7.0_05-b05)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7067) ClassCastException while using a key created by keytool to create encryption zone.

2015-01-14 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277619#comment-14277619
 ] 

Charles Lamb commented on HDFS-7067:


[~cmccabe],

bq. Charles, is the TestKeyProviderFactory failure due to this patch?

Correct. test-patch.sh doesn't apply the hdfs7067.keystore file to 
hadoop-common/hadoop-common/src/test/resources and so the new test (which 
depends on it) will fail. The test passes when I apply the patch and the 
.keystore file in a fresh clone.


 ClassCastException while using a key created by keytool to create encryption 
 zone. 
 ---

 Key: HDFS-7067
 URL: https://issues.apache.org/jira/browse/HDFS-7067
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.6.0
Reporter: Yi Yao
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7067.001.patch, HDFS-7067.002.patch, 
 hdfs7067.keystore


 I'm using transparent encryption. If I create a key for KMS keystore via 
 keytool and use the key to create an encryption zone. I get a 
 ClassCastException rather than an exception with decent error message. I know 
 we should use 'hadoop key create' to create a key. It's better to provide an 
 decent error message to remind user to use the right way to create a KMS key.
 [LOG]
 ERROR[user=hdfs] Method:'GET' Exception:'java.lang.ClassCastException: 
 javax.crypto.spec.SecretKeySpec cannot be cast to 
 org.apache.hadoop.crypto.key.JavaKeyStoreProvider$KeyMetadata'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   >