[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2022-12-14 Thread Yanlei Yu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647866#comment-17647866
 ] 

Yanlei Yu commented on HDFS-7923:
-

hi,[~cmccabe] ,in my 1000+ cluster test, I found that during namenode restart, 
if datanode does not connect to namenode because namenode is busy, it will 
print IOException in offerService, However, forceFullBr is already false in the 
next loop, which causes the datanode to fail to send FBR during the namenode 
restart. As a result, the namenode restarts slowly. The block reporting phase 
takes about 20 hours. I can set forceFullBr to true after the exception is 
received so that the datanode can send the FBR in a timely manner. How do you 
feel?

> The DataNodes should rate-limit their full block reports by asking the NN on 
> heartbeat messages
> ---
>
> Key: HDFS-7923
> URL: https://issues.apache.org/jira/browse/HDFS-7923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
> HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
> HDFS-7923.006.patch, HDFS-7923.007.patch
>
>
> The DataNodes should rate-limit their full block reports.  They can do this 
> by first sending a heartbeat message to the NN with an optional boolean set 
> which requests permission to send a full block report.  If the NN responds 
> with another optional boolean set, the DN will send an FBR... if not, it will 
> wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-12-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056403#comment-15056403
 ] 

Colin Patrick McCabe commented on HDFS-7923:


We have tested this on large (300 node) clusters.  If you want to disable this, 
then you can simply set dfs.namenode.max.full.block.report.leases to a very 
large value.  Then you can have the old behavior where there is no 
rate-limiting on the block reports that come into the NameNode.  I'm not sure 
why you would want that behavior, though.

> The DataNodes should rate-limit their full block reports by asking the NN on 
> heartbeat messages
> ---
>
> Key: HDFS-7923
> URL: https://issues.apache.org/jira/browse/HDFS-7923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.8.0
>
> Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
> HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
> HDFS-7923.006.patch, HDFS-7923.007.patch
>
>
> The DataNodes should rate-limit their full block reports.  They can do this 
> by first sending a heartbeat message to the NN with an optional boolean set 
> which requests permission to send a full block report.  If the NN responds 
> with another optional boolean set, the DN will send an FBR... if not, it will 
> wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-12-11 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053839#comment-15053839
 ] 

Jitendra Nath Pandey commented on HDFS-7923:


[~cmccabe], has this feature been tested at scale? Could you please give some 
details about testing done on this?
Also, is there a way we can disable this feature? 

> The DataNodes should rate-limit their full block reports by asking the NN on 
> heartbeat messages
> ---
>
> Key: HDFS-7923
> URL: https://issues.apache.org/jira/browse/HDFS-7923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.8.0
>
> Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
> HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
> HDFS-7923.006.patch, HDFS-7923.007.patch
>
>
> The DataNodes should rate-limit their full block reports.  They can do this 
> by first sending a heartbeat message to the NN with an optional boolean set 
> which requests permission to send a full block report.  If the NN responds 
> with another optional boolean set, the DN will send an FBR... if not, it will 
> wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588819#comment-14588819
 ] 

Colin Patrick McCabe commented on HDFS-7923:


This change is important for avoiding cascading failures (aka congestion 
collapse.)  Currently when the NN gets too many full block reports at once, the 
extra block reports slow down the processing of the existing ones (because 
storing the large RPCs generates GC activity up to and including full GCs).  So 
you get into a negative spiral-- can't process FBRs fast enough?  Then have 
some more FBRs which will slow you down even more.  And so on.  Keep in mind 
with the previous code, the DN would send its full block report all over again 
if the NN didn't respond within some timeout, which could lead to the NN having 
multiple (large) copies of the same full block report queued up.  It's true 
that you could usually avoid these scenarios by careful configuration and 
tuning, but this kind of fragile congestion collapse behavior should not be in 
the system.  This change is also important for maintaining any sort of 
reasonable quality of service on the NN, since otherwise we can get completely 
flooded with FBRs and can't do any other work.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588763#comment-14588763
 ] 

Andrew Wang commented on HDFS-7923:
---

If we're really worried about starvation, we could add a failsafe to the DN 
side that sets a lease ID of 0 after say, three FBR intervals. This will skip 
the rate limiting.

Overall though I agree with Colin, we get starvation when a NN remains 
continually fully loaded with FBR work from other nodes. Such a cluster would 
be basically unusable for real work. The default # concurrent FBRs is also 1, 
so we should be gated on NN CPU rather than the much slower heartbeat interval.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588681#comment-14588681
 ] 

Sanjay Radia commented on HDFS-7923:


bq. This change is really helpful during startup on big clusters. In the past 
we have seen restarting all the DNs at once on a several hundred node cluster 
bring the NN to its knees. 

There is already a random backoff for the initial block report. You can 
configure the  initial BR backoff time. When that jira was done there was a 
proposal to give each DN a different backoff time depending on the number of 
outstanding BRs; this enhancement was not done at that time because this 
backoff worked very well. For a several hundred node cluster the initial BR 
backoff time should be approx 60sec.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588686#comment-14588686
 ] 

Colin Patrick McCabe commented on HDFS-7923:


BTW it would be relatively easy to enforce fairness on the NN side in this 
patch.  In fact I had an earlier version that did do that.  It would give out 
FBR leases to datanodes based on a round-robin order.  But I decided against it 
since it was extra complexity without a good reason.  In particular, I was 
worried that I'd create starvation if the set of DNs the NN selected to report 
in didn't want to report in (perhaps because they were using initialDelay...).  
Since full block report periods can be very long this is a real concern and 
would necessitate a hack like a timeout.  It was much easier to keep block 
report scheduling as a datanode-side thing.

We can revisit this if it ever seems like a good idea, but I bet that you will 
not be able to create starvation in any real cluster without creating 
conditions that would make the cluster unusable no matter what.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588663#comment-14588663
 ] 

Colin Patrick McCabe commented on HDFS-7923:


Starvation is not a real concern here.  Imagine a 1000 node cluster where full 
block reports are 6 hours apart.  Then the NN needs to be able to handle 1.6 
full block reports a minute.  If each one takes 500 ms (we'll be pessimistic), 
then 0.5 out of every 90 seconds is FBR time, or 0.5% of the time.  If you want 
to be even more pessimistic and assume each block report is 1 hour apart rather 
than 6, just multiple that number by 6 to get 3% of the time.

For starvation to happen, you'd have to be spending close to 100% of the time 
on full block reports.  That's just not going to happen.  And if it does 
happen, you have bigger problems, like not being able to actually do anything 
on the NameNode (since you're spending all your time on FBRs, which hold the 
FSN write lock).

Even if you were spending close to 100% of the time on full block reports, the 
existing code doesn't enforce fairness... I can configure one DN to send full 
block reports every 30 minutes, and configure everyone else to send every 10 
hours.  The FBR period is a datanode-side configuration, not a NN-side one.

This change is really helpful during startup on big clusters.  In the past we 
have seen restarting all the DNs at once on a several hundred node cluster 
bring the NN to its knees.  All of the RPC handlers get flooded with FBRs, but 
only one can make progress at once.  The flood of FBRs also triggers full GCs, 
since we can't handle them in a timely fashion and they enter the oldgen.  I 
realize that {{dfs.blockreport.initialDelay}} was designed as a workaround, but 
it is difficult to know what value to set it to, results in slower startup, and 
is often overlooked in real-world deployments.

If we want to work on enforcing fairness on the NN-side, we can do that, but it 
seems unrelated to this change to me.  It's also not something we currently do, 
so it would be nice to see data showing that it was helpful.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588931#comment-14588931
 ] 

Sanjay Radia commented on HDFS-7923:


Starvation  I didn't literally mean starvation but was more concerned about 
fairness and about safety. Can a DN's block report be delayed for some 
significant period of time or due to subtle bug even long times. Our current 
implementation is very resilient - DNs just sent the BRs at a specific period 
irrespective of the NN. Does your design have a safety net  - say a DN will 
wait a max of 2 periods to get permission (or something like that).

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589157#comment-14589157
 ] 

Colin Patrick McCabe commented on HDFS-7923:


bq. Can a DN's block report be delayed for some significant period of time or 
due to subtle bug even long times

So it's important to distinguish between sending block reports and processing 
block reports.  This patch delays sending block reports, but it should not 
delay processing block reports by any significant amount.  The idea is that in 
general sending a bunch of block reports that can't be processed until much 
later is bad (for the reasons discussed above like GC problems, lack of RPC 
handler threads, memory consumption, etc.)  But the patch should keep the FBRs 
flowing pretty regularly... we will still queue up 6 of them on the NN even 
though we can only process 1 at once.

bq. Does your design have a safety net - say a DN will wait a max of 2 periods 
to get permission (or something like that).

This is kind of like a traffic light, right?  If the traffic light is red for a 
long time, there must be a problem somewhere else in the system.  But the 
solution can't be to slam on the accelerator when the red light lasts too long. 
 You'll just crash, especially in a traffic jam.

Maybe car analogies are taking it too far, but hopefully you can see what I'm 
saying.  I think sending FBRs when the system is not ready for them is a really 
bad behavior.  It leads to congestion collapse, which is much worse than 
starving a few DNs for a while.

Hmm. What if we had a metric which was the average length of time the DN had to 
wait before sending a full block report that it wanted to send?  Management 
systems could follow this metric and raise an alert when the time got too high. 
 Then the admin can do something to solve the problem.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587054#comment-14587054
 ] 

Sanjay Radia commented on HDFS-7923:


How will you ensure that a particular DN does not get staved? i.e. How do you 
guarantee that BRs will get through. HDFS depends on periodic BRs for 
correctness. I recall in  discussions with Facebook where they changed their 
HDFS for incremental BRs but still kept full BRs at a lower frequency just for 
safety.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587090#comment-14587090
 ] 

Suresh Srinivas commented on HDFS-7923:
---

I just notice this. [~daryn] and I have talked about improving block report 
processing efficiency. The fact that it can take close to 300 ms is a concern. 
This change work around that and I see cases where Datanodes can be starved. 
[~daryn], any comments?

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587163#comment-14587163
 ] 

Sanjay Radia commented on HDFS-7923:


Have you considered a pull model (NN pulls) which does not risk starvation?

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584620#comment-14584620
 ] 

Hudson commented on HDFS-7923:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2155 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2155/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584628#comment-14584628
 ] 

Hudson commented on HDFS-7923:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/216/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584661#comment-14584661
 ] 

Hudson commented on HDFS-7923:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2173 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2173/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584651#comment-14584651
 ] 

Hudson commented on HDFS-7923:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #225 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/225/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584558#comment-14584558
 ] 

Hudson commented on HDFS-7923:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #227 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/227/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584563#comment-14584563
 ] 

Hudson commented on HDFS-7923:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #957 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/957/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583855#comment-14583855
 ] 

Hudson commented on HDFS-7923:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8011 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8011/])
HDFS-7923. The DataNodes should rate-limit their full block reports by asking 
the NN on heartbeat messages (cmccabe) (cmccabe: rev 
12b5b06c063d93e6c683c9b6fac9a96912f59e59)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
Add HDFS-7923 to CHANGES.txt (cmccabe: rev 
46b0b4179c1ef1a1510eb04e40b11968a24df485)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-12 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583812#comment-14583812
 ] 

Andrew Wang commented on HDFS-7923:
---

+1 pending, some of the checkstyles look related though so please give them a 
look. Whitespace we can fix up at commit time.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577986#comment-14577986
 ] 

Colin Patrick McCabe commented on HDFS-7923:


New patch fixes unit test failures.  TestBpServiceActorScheduler needed to be 
reverted to the old version (since the BR timing change was also removed from 
the latest version), TestDatanodeManager needed to be tweaked because of the 
way it uses mocks, to avoid a null pointer exception.  There was also a bug 
where we'd get an NPE when removing elements from the internal linked lists in 
the BlockReportLeaseManager which was causing unit test failures.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578204#comment-14578204
 ] 

Hadoop QA commented on HDFS-7923:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 16 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 11s | The applied patch generated  
23 new checkstyle issues (total was 1341, now 1355). |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 160m 55s | Tests passed in hadoop-hdfs. 
|
| | | 207m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738456/HDFS-7923.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84ba1a7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11280/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11280/console |


This message was automatically generated.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575058#comment-14575058
 ] 

Colin Patrick McCabe commented on HDFS-7923:


bq. Should the checkLease logs be done to the blockLog? We log the startup 
error log there in processReport

I like having the ability to turn on and off TRACE logging for various 
subsystems.  Putting everything in the blockLog would make that harder, right?

bq. Update javadoc in BlockReportContext with what leaseID is for.

added

bq. Add something to the log message about overwriting the old leaseID in 
offerService. Agree that this shouldn't really trigger, but good defensive 
coding practice 

ok

bq. DatanodeManager, there's still a register/unregister in registerDatanode I 
think we could skip. This is the node restart case where it's registered 
previously.

Good catch.  The calls in {{removeDatanode}} and {{addDatanode}} should take 
care of this, so there's no need to have it here.

bq. BRLManager requestLease, we auto-register the node on requestLease. This 
shouldn't happen since DNs need to register before doing anything else. We can 
keep this here

I added a warn message since this shouldn't happen.

bq. Still need documentation of new config keys in hdfs-default.xml

added

bq. Extra import in TestBPSAScheduler and BPSA

removed

bq. If you want to pursue \[the block report timing\] logic change more, let's 
split it out into a follow-on JIRA. The rest LGTM, +1 pending above comments.

OK, I will restore the old behavior for now, and we can do this in a follow-on 
change.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575183#comment-14575183
 ] 

Andrew Wang commented on HDFS-7923:
---

Thanks Colin, +1 pending Jenkins. Great work on this one.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575434#comment-14575434
 ] 

Hadoop QA commented on HDFS-7923:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 15 new or modified test files. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  
23 new checkstyle issues (total was 1342, now 1356). |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 15s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 162m 11s | Tests failed in hadoop-hdfs. |
| | | 208m 16s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestBpServiceActorScheduler |
|   | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
|   | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738044/HDFS-7923.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2dbc40e |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11248/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11248/console |


This message was automatically generated.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575423#comment-14575423
 ] 

Hadoop QA commented on HDFS-7923:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 15 new or modified test files. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  
23 new checkstyle issues (total was 1341, now 1355). |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 19s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 19s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 167m 15s | Tests failed in hadoop-hdfs. |
| | | 213m 57s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestSafeMode |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
|   | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.TestSetrepDecreasing |
|   | hadoop.hdfs.server.datanode.TestBpServiceActorScheduler |
|   | hadoop.hdfs.server.namenode.TestNameNodeRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12738040/HDFS-7923.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2dbc40e |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11247/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11247/console |


This message was automatically generated.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565561#comment-14565561
 ] 

Colin Patrick McCabe commented on HDFS-7923:


bq. Missing config key documentation in hdfs-defaults.xml

added

bq. requestBlockReportLeaseId: empty catch for unregistered node, we could add 
some more informative logging rather than relying on the warn below

added

bq. I discussed the NodeData structure with Colin offline, wondering why we 
didn't use a standard Collection. Colin brought up the reason of reducing 
garbage, which seems valid. I think we should consider implementing 
IntrusiveCollection though rather than writing another.

yes, there will be quite a few of these requests coming in at any given point.  
IntrusiveCollection is an interface rather than an implementation, so I don't 
think it would help here (it's most useful when an element needs to be in 
multiple lists at once, and when you need fancy operations like finding the 
list from the element)

bq. I also asked about putting NodeData into DatanodeDescriptor. Not sure what 
the conclusion was on this, it might reduce garbage since we don't need a 
separate NodeData object.

The locking is easier to understand if all the lease data is inside 
{{BlockReportLeaseManager}}.

bq. I prefer Precondition checks for invalid configuration values at startup, 
so there aren't any surprises for the user. Not everyone reads the messages on 
startup.

ok

bq. requestLease has a check for isTraceEnabled, then logs at debug level

fixed

bq. In offerService, we ignore the new leaseID if we already have one. On the 
NN though, a new request wipes out the old leaseID, and processReport checks 
based on leaseID rather than node. This kind of bug makes me wonder why we 
really need the leaseID at all, why not just attach a boolean to the node? Or 
if it's in the deferred vs. pending list?

It's safer for the NameNode to wipe the old lease ID every time there is a new 
request.  It avoids problems where the DN went down while holding a lease, and 
then came back up.  We could potentially also avoid those problems by being 
very careful with node (un)registration, but why make things more complicated 
than they need to be?  I do think that the DN should overwrite its old lease ID 
if the NN gives it a new one, for the same reason.  Let me change it to do 
that... Of course this code path should never happen since the NN should never 
give a new lease ID when none was requested.  So calling this a bug seems 
like a bit of a stretch.

I prefer IDs to simply checking against the datanode UUID, because lease IDs 
allow us to match up the NN granting a lease with the DN accepting and using 
it, which is very useful for debugging or understanding what is happening in 
production.  It also makes it very obvious whether a DN is cheating by 
sending a block report with leaseID = 0 to disable rate-limiting.  This is a 
use-case we want to support but we also want to know when it is going on.

bq. Can we fix the javadoc for scheduleBlockReport to mention randomness, and 
not send...at the next heartbeat? Incorrect right now.

I looked pretty far back into the history of this code.  It seems to go back to 
at least 2009.  The underlying ideas seem to be:
1. the first full block report can have a configurable delay in seconds 
expressed by {{dfs.blockreport.initialDelay}}
2. the second full block report gets a random delay between 0 and 
{{dfs.blockreport.intervalMsec}}
3. all other block reports get an interval of {{dfs.blockreport.intervalMsec}} 
*unless* the previous block report had a longer interval than expected... if 
the previous one had a longer interval than expected, the next one gets a 
shorter interval.

We can keep behavior #1... it's simple to implement and may be useful for 
testing (although I think this patch makes it no longer necessary).

Behavior #2 seems like a workaround for the lack of congestion control in the 
past.  In a world where the NN rate-limits full block reports, we don't need 
this behavior to prevent FBRs from clumping.  They will just naturally not 
overly clump because we are rate-limiting them.

Behavior #3 just seems incorrect, even without this patch.  By definition, a 
full block report contains all the information the NN needs to understand the 
DN state.  Just because block report interval N was longer than expected, seems 
no reason to shorten block report interval N+1.  In fact, this behavior seems 
like it could lead to congestion collapse... if the NN gets overloaded and 
can't handle block reports for some time, a bunch of DNs will shorten the time 
in between the current block report and the next one, further increasing total 
NN load.  Not good.  Not good at all.

I replaced this with a simple randomize first block report time within 0 and 
{{dfs.blockreport.initialDelay}}, then try to do all other 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565672#comment-14565672
 ] 

Andrew Wang commented on HDFS-7923:
---

Nits:

* Should the checkLease logs be done to the blockLog? We log the startup error 
log there in processReport
* Update javadoc in BlockReportContext with what leaseID is for.
* Add something to the log message about overwriting the old leaseID in 
offerService. Agree that this shouldn't really trigger, but good defensive 
coding practice :)
* DatanodeManager, there's still a register/unregister in registerDatanode I 
think we could skip. This is the node restart case where it's registered 
previously.
* BRLManager requestLease, we auto-register the node on requestLease. This 
shouldn't happen since DNs need to register before doing anything else. We can 
keep this here
* Still need documentation of new config keys in hdfs-default.xml

Block report scheduling:
* We removed TestBPSAScheduler#testScheduleBlockReportImmediate, should this 
swap over to testing forceFullBlockReport?
* Extra import in TestBPSAScheduler and BPSA
* I'm worried about convoy effects if we don't stick to the stride system of 
the old code. I think of the old code as follows:

# Choose a random time within the initialDelay interval to jitter
# Attempt to block report at that same time every hour.

This keeps the BRs from all the DNs spread out, even if the NN gets temporarily 
backed up. Once the NN catches up and flushes its backlog of FBRs, future BRs 
will still be nicely spread out.

My understanding of your new scheme is that after a DN successfully BRs, it'll 
BR again an hour afterwards. So, if all the BRs piled up and then are processed 
in quick succession, all the DNs will BR at about the same time next hour. 
Since we want to spread the BRs out across the hour, this is not good.

Other ideas are to round up to the next stride. Or, wait an interval plus a 
random delay. We might consider some congestion control too, where the DNs 
backoff linearly or exponentially. All these schemes delay the FBRs, but maybe 
we trust IBRs enough now.

If you want to pursue this logic change more, let's split it out into a 
follow-on JIRA. The rest LGTM, +1 pending above comments.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565784#comment-14565784
 ] 

Hadoop QA commented on HDFS-7923:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 13s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 15 new or modified test files. |
| {color:green}+1{color} | javac |   9m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 19s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 36s | The applied patch generated  
25 new checkstyle issues (total was 1365, now 1380). |
| {color:red}-1{color} | whitespace |   0m  9s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 53s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 43s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 35s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 100m 55s | Tests failed in hadoop-hdfs. |
| | | 152m 39s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.TestSetrepDecreasing |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736276/HDFS-7923.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6aec13c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/console |


This message was automatically generated.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563456#comment-14563456
 ] 

Colin Patrick McCabe commented on HDFS-7923:


I posted a new patch with some changes from the previous approach.  Since 
Datanodes can go away at any time after the NN gives them the green light, this 
patch adds the concept of leases for block reports.  Leases have a fixed time 
length... if the DN can't send its block report within that time, it loses the 
lease.  I also added a new fault injection framework to monitor what is going 
on in the BlockManager.  There was some milliseconds / seconds confusion in the 
existing initial block report delay code that I fixed (might want to split this 
off into a separate JIRA...)

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-28 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563895#comment-14563895
 ] 

Andrew Wang commented on HDFS-7923:
---

Neato patch Colin, this is a high-ish level review, I probably need to do 
another pass.

Small stuff:
* Missing config key documentation in hdfs-defaults.xml
* requestBlockReportLeaseId: empty catch for unregistered node, we could add 
some more informative logging rather than relying on the warn below

BlockReportLeaseManager
* I discussed the NodeData structure with Colin offline, wondering why we 
didn't use a standard Collection. Colin brought up the reason of reducing 
garbage, which seems valid. I think we should consider implementing 
IntrusiveCollection though rather than writing another.
* I also asked about putting NodeData into DatanodeDescriptor. Not sure what 
the conclusion was on this, it might reduce garbage since we don't need a 
separate NodeData object.
* I prefer Precondition checks for invalid configuration values at startup, so 
there aren't any surprises for the user. Not everyone reads the messages on 
startup.
* requestLease has a check for isTraceEnabled, then logs at debug level

BPServiceActor:
* In offerService, we ignore the new leaseID if we already have one. On the NN 
though, a new request wipes out the old leaseID, and processReport checks based 
on leaseID rather than node. This kind of bug makes me wonder why we really 
need the leaseID at all, why not just attach a boolean to the node? Or if it's 
in the deferred vs. pending list?
* Can we fix the javadoc for scheduleBlockReport to mention randomness, and not 
send...at the next heartbeat? Incorrect right now.
* Have you thought about moving the BR scheduler to the NN side? We still rely 
on the DNs to jitter themselves and do the initial delay, but we could have the 
NN handle all this. This would also let the NN trigger FBRs whenever it wants. 
We could also do better than random scheduling, i.e. stride it rather than 
jitter. Incompatible, so we probably won't, but fun to think about :)
* scheduleBlockReport(long) do we want to add a checkArgument that delayMs is 
geq 0? You nixed the else case.

DatanodeManager:
* Could we do the BRLManager register/unregister in addDatanode and 
removeDatanode? I think this is safe, since on a DN restart it'll provide a 
lease ID of 0 and FBR, even without a reg/unreg.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516053#comment-14516053
 ] 

Colin Patrick McCabe commented on HDFS-7923:


Thanks, [~clamb].  I like this approach.  It avoids sending the block report 
until the NN requests it.  So we don't have to throw away a whole block report 
to achieve backpressure.

{code}
  public static final String  DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = 
dfs.namenode.max.concurrent.block.reports;
  public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT 
= Integer.MAX_VALUE;
{code}
It seems like this should default to something less than the default number of 
RPC handler threads, not to MAX_INT.  Given that dfs.namenode.handler.count = 
10, it seems like this should be no more than 5 or 6, right?  The main point 
here to avoid having the NN handler threads completely choked with block 
reports, and that is defeated if the value is MAX_INT.  I realize that you 
probably intended this to be configured.  But it seems like we should have a 
reasonable default that works for most people.

{code}
--- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
+++ hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
@@ -195,6 +195,7 @@ message HeartbeatRequestProto {
   optional uint64 cacheCapacity = 6 [ default = 0 ];
   optional uint64 cacheUsed = 7 [default = 0 ];
   optional VolumeFailureSummaryProto volumeFailureSummary = 8;
+  optional bool requestSendFullBlockReport = 9;
 }
{code}
Let's have a {{\[default = false\]}} here so that we don't have to add a bunch 
of clunky {{HasFoo}} checks.  Unless there is something we'd like to do 
differently in the false and not present cases, but I can't think of what 
that would be.

{code}
+  /* Number of block reports currently being processed. */
+  private final AtomicInteger blockReportProcessingCount = new 
AtomicInteger(0);
{code}
I'm not sure an {{AtomicInteger}} makes sense here.  We only modify this 
variable (write to it) when holding the FSN lock in write mode, right?  And we 
only read from it when holding the FSN in read mode.  So, there isn't any need 
to add atomic ops.

{code}
+  boolean okToSendFullBlockReport = true;
+  if (requestSendFullBlockReport 
+  blockManager.getBlockReportProcessingCount() =
+  maxConcurrentBlockReports) {
+/* See if we should tell DN to back off for a bit. */
+final long lastBlockReportTime = blockManager.getDatanodeManager().
+getDatanode(nodeReg).getLastBlockReportTime();
+if (lastBlockReportTime  0) {
+  /* We've received at least one block report. */
+  final long msSinceLastBlockReport = now() - lastBlockReportTime;
+  if (msSinceLastBlockReport  maxBlockReportDeferralMsec) {
+/* It hasn't been long enough to allow a BR to pass through. */
+okToSendFullBlockReport = false;
+  }
+}
+  }
+  return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo,
+  okToSendFullBlockReport);
{code}
There is a TOCTOU (time of check, time of use) race condition here, right?  
1000 datanodes come in and ask me whether it's ok to send an FBR.  In each 
case, I check the number of ongoing FBRs, which is 0, and say yes.  Then 1000 
FBRs arrive all at once and the NN melts down.

I think we need to track which datanodes we gave the green light to, and not 
decrement the counter until they either send that report, or some timeout 
expires.  (We need the timeout in case datanodes go away after requesting 
permission-to-send.)  The timeout can probably be as short as a few minutes.  
If you can't manage to send an FBR in a few minutes, there's more problems 
going on.

{code}
  public static final String  DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = 
dfs.blockreport.max.deferMsec;
  public static final longDFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = 
Long.MAX_VALUE;
{code}
Do we really need this config key?  It seems like we added it because we wanted 
to avoid starvation (i.e. the case where a given DN never gets given the green 
light).  But we are maintaining the last FBR time for each DN anyway.  Surely 
we can just have a TreeMap or something and just tell the guys with the oldest 
{{lastSentTime}} to go.  There aren't an infinite number of datanodes in the 
cluster, so eventually everyone will get the green light.

I really would prefer not to have this tunable at all, since I think it's 
unnecessary.  In any case, it's certainly doing us no good as MAX_U64.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-04-06 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481715#comment-14481715
 ] 

Charles Lamb commented on HDFS-7923:


Here is a description of the heuristic that my patch has implemented for the NN 
to determine what to send back in response to the should I send a BR? 
question. In the vein of keeping it relatively simple, let's consider 3 
parameters:


*   The max # of FBR requests that the NN is willing to process at any given 
time (to be called 'dfs.namenode.max.concurrent.block.reports', with a default 
of Integer.MAX_INTEGER)
*   The DN's configured block report interval (dfs.blockreport.intervalMsec). 
This parameter already exists.
*   The max time we ever want the NN to go without receiving an FBR from a 
given DN ('dfs.blockreport.max.deferMsec').

If the time since the last FBR received from the DN is less than 
dfs.blockreport.intervalMsec, then it returns false (No, don't send an FBR). 
In theory, this should never happen if the DN is obeying 
dfs.blockreport.intervalMsec.

If the number of block reports currently being processed by an NN is less than 
dfs.namenode.max.concurrent.block.reports, and the time since it last received 
an FBR from the DN sending the heartbeat is greater than 
dfs.blockreport.intervalMsec, then the NN automatically answers true (Yes, 
send along an FBR).

If the number of BRs being processed by an NN is  than 
dfs.namenode.max.concurrent.block.reports when it receives the heartbeat, then 
it checks the last time that it received an FBR from the DN sending the 
heartbeat and if it's greater than dfs.blockreport.max.deferMsec, then it 
returns true (Yes, send along an FBR). If the time-since-last-FBR is less 
than dfs.blockreport.max.deferMsec, then it returns false.


 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Attachments: HDFS-7923.000.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)