[ 
https://issues.apache.org/jira/browse/HDFS-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated HDFS-14783:
--------------------------------
    Description: 
SlowPeersReport is generated by the SampleStat between tow dn, so it can 
present on nn's jmx like this:
{code:java}
"SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
{code}
In each period, MutableRollingAverages will do a rollOverAvgs(), it will 
generate a SumAndCount object which is based on SampleStat, and store it in a 
LinkedBlockingDeque<SumAndCount>, the deque will be used to generate  
SlowPeersReport. And the old member of deque won't be removed until the queue 
is full. However, if dn1 don't send any packet to dn2 in the last of 36*300_000 
ms, the deque will be filled with an old member, because the number of last 
SampleStat never change.I think this old SampleStat should consider to be 
expired and ignore it when 

the SampleStat is stored in a LinkedBlockingDeque<SumAndCount>, it won't be 
removed until the queue is full and a newest one is generated. Therefore, if 
dn1 don't send any packet to dn2 for a long time, the old SampleStat will keep 
staying in the queue, and will be used to calculated slowpeer.I think these old 
SampleStats should be considered as expired message and ignore them when 
generating a new SlowPeersReport.

  was:
SlowPeersReport is generated by the SampleStat between tow dn, so it can 
present on nn's jmx like this:
{code:java}
"SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
{code}
In each period, MutableRollingAverages will do a rollOverAvgs(), it will 
generate a SumAndCount object which is based on SampleStat, and store it in a 
LinkedBlockingDeque<SumAndCount>, the deque will be used to generate  
SlowPeersReport. And the old member of deque won't be removed until the queue 
is full. However, if dn1 don't send any packet to dn2 in the last of 36*300_000 
ms, the deque will be filled with an old member, because 

the SampleStat is stored in a LinkedBlockingDeque<SumAndCount>, it won't be 
removed until the queue is full and a newest one is generated. Therefore, if 
dn1 don't send any packet to dn2 for a long time, the old SampleStat will keep 
staying in the queue, and will be used to calculated slowpeer.I think these old 
SampleStats should be considered as expired message and ignore them when 
generating a new SlowPeersReport.


> Expired SampleStat should ignore when generating SlowPeersReport
> ----------------------------------------------------------------
>
>                 Key: HDFS-14783
>                 URL: https://issues.apache.org/jira/browse/HDFS-14783
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Haibin Huang
>            Assignee: Haibin Huang
>            Priority: Major
>         Attachments: HDFS-14783, HDFS-14783-001.patch, HDFS-14783-002.patch, 
> HDFS-14783-003.patch, HDFS-14783-004.patch, HDFS-14783-005.patch
>
>
> SlowPeersReport is generated by the SampleStat between tow dn, so it can 
> present on nn's jmx like this:
> {code:java}
> "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
> {code}
> In each period, MutableRollingAverages will do a rollOverAvgs(), it will 
> generate a SumAndCount object which is based on SampleStat, and store it in a 
> LinkedBlockingDeque<SumAndCount>, the deque will be used to generate  
> SlowPeersReport. And the old member of deque won't be removed until the queue 
> is full. However, if dn1 don't send any packet to dn2 in the last of 
> 36*300_000 ms, the deque will be filled with an old member, because the 
> number of last SampleStat never change.I think this old SampleStat should 
> consider to be expired and ignore it when 
> the SampleStat is stored in a LinkedBlockingDeque<SumAndCount>, it won't be 
> removed until the queue is full and a newest one is generated. Therefore, if 
> dn1 don't send any packet to dn2 for a long time, the old SampleStat will 
> keep staying in the queue, and will be used to calculated slowpeer.I think 
> these old SampleStats should be considered as expired message and ignore them 
> when generating a new SlowPeersReport.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to