[
https://issues.apache.org/jira/browse/HADOOP-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-4971:
---------------------------------
Attachment: HADOOP-4971.patch
A simple patch is attached.
> If there are many datanodes doing block reports at 8:20:xx,
Not expected to, since it already randomized. This does not propose to fix the
case where most datanodes randomly assign themselves a number within small
range of the block report interval.
> then these datanodes will keep doing block report at the same time for the
> proposed fix.
The patch does not change this behaviour. It is same as before. The start start
time is already randomized. I might be misunderstanding. Could you give a
concrete example (preferably with numbers) that shows the problem you are
takinng about?
> Block report times from datanodes could converge to same time.
> -----------------------------------------------------------------
>
> Key: HADOOP-4971
> URL: https://issues.apache.org/jira/browse/HADOOP-4971
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Priority: Blocker
> Fix For: 0.18.3
>
> Attachments: HADOOP-4971.patch
>
>
> Datanode block reports take quite a bit of memory to process at the namenode.
> After the inital report, DNs pick a random time to spread this load across at
> the NN. This normally works fine.
> Block reports are sent inside "offerService()" thread in DN. If for some
> reason this thread was stuck for long time (comparable to block report
> interval), and same thing happens on many DNs, all of them get back to the
> loop at the same time and start sending block report then and every hour at
> the same time.
> RPC server and clients in 0.18 can handle this situation fine. But since this
> is a memory intensive RPC it lead to large GC delays at the NN. We don't know
> yet why offerService therads seemed to be stuck, but DN should re-randomize
> it block report time in such cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.