[ 
https://issues.apache.org/jira/browse/HADOOP-17408?focusedWorklogId=533098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533098
 ]

ASF GitHub Bot logged work on HADOOP-17408:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jan/21 16:07
            Start Date: 08/Jan/21 16:07
    Worklog Time Spent: 10m 
      Work Description: amahussein commented on pull request #2601:
URL: https://github.com/apache/hadoop/pull/2601#issuecomment-756839310


   @jbrennan333 The failing unit test is not related. I will take note and 
watch for it to file a jira if it shows up in the qbt-report.
   I have intuition that the equal-weighted nodes should be shuffled to 
maintain load balancing. In order to avoid the redundancy of shuffling twice, I 
changed the default `secondSorter` to null.
   I made quick evaluation of the performance using a single threaded. The new 
changes show less variance in code performance and slightly lower average.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 533098)
    Time Spent: 1h 40m  (was: 1.5h)

> Optimize NetworkTopology while sorting of block locations
> ---------------------------------------------------------
>
>                 Key: HADOOP-17408
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17408
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, net
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In {{NetworkTopology}}, I noticed that there are some hanging fruits to 
> improve the performance.
> Inside {{sortByDistance}}, collections.shuffle is performed on the list 
> before calling {{secondarySort}}.
> {code:java}
> Collections.shuffle(list, r);
> if (secondarySort != null) {
>   secondarySort.accept(list);
> }
> {code}
> However, in different call sites, {{collections.shuffle}} is passed as the 
> secondarySort to {{sortByDistance}}. This means that the shuffle is executed 
> twice on each list.
> Also, logic wise, it is useless to shuffle before applying a tie breaker 
> which might make the shuffle work obsolete.
> In addition, [~daryn] reported that:
> * topology is unnecessarily locking/unlocking to calculate the distance for 
> every node
> * shuffling uses a seeded Random, instead of ThreadLocalRandom, which is 
> heavily synchronized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to