[ 
https://issues.apache.org/jira/browse/HBASE-19917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351784#comment-16351784
 ] 

Xiang Li edited comment on HBASE-19917 at 2/4/18 1:41 PM:
----------------------------------------------------------

Thanks for your comment [~yuzhih...@gmail.com]!
filterServers() is only called in RSGroupBasedLoadBalancer#filterServers(), as 
follow:
{code}
return filterServers(RSGroupInfo.getServers(), onlineServers);
{code}
RSGroupInfo#getServers() returns servers, a SortedSet. It is a TreeSet 
actually, built by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling filterServers(), 
size of servers is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because 
TreeSet#contains() is logn and we loop for m.
# Turn TreeSet into HashSet, to pursue O(1) for contains(). Time complexity is 
O(m + n), as the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O( n ) for time complexity (if I 
get it correctly) as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * 
O(1).

I think #1 is good enough, although it is worse than #2 which is linear. What 
is your opinion?

Regarding
bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the 
comments, 
{code}
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
{code}
TreeSet is only used for display purpose. I am checking if HashSet could be 
used to replace TreeSet throughout the calling chain.



was (Author: water):
Thanks for your comment [~yuzhih...@gmail.com]!
{{filterServers()}} is only called in 
{{RSGroupBasedLoadBalancer#filterServers()}}, as follow:
{code}
return filterServers(RSGroupInfo.getServers(), onlineServers);
{code}
{{RSGroupInfo#getServers()}} returns servers, a SortedSet. It is a TreeSet 
actually, built by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling {{filterServers()}}, 
size of servers is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because 
TreeSet#contains() is logn and we loop for m.
# Turn TreeSet into HashSet, to pursue O(1) for contains(). Time complexity is 
O(m + n), as the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O( n ) for time complexity (if I 
get it correctly) as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * 
O(1).

I think #1 is good enough, although it is worse than #2 which is linear. What 
is your opinion?

Regarding
bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the 
comments, 
{code}
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
{code}
TreeSet is only used for display purpose. I am checking if HashSet could be 
used to replace TreeSet throughout the calling chain.



> Improve RSGroupBasedLoadBalancer#filterServers() to be more efficient
> ---------------------------------------------------------------------
>
>                 Key: HBASE-19917
>                 URL: https://issues.apache.org/jira/browse/HBASE-19917
>             Project: HBase
>          Issue Type: Improvement
>          Components: rsgroup
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>         Attachments: HBASE-19917.master.000.patch
>
>
> {code:title=hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java|borderStyle=solid}
> private List<ServerName> filterServers(Collection<Address> servers,
>     Collection<ServerName> onlineServers) {
>   ArrayList<ServerName> finalList = new ArrayList<ServerName>();
>   for (Address server : servers) {
>     for(ServerName curr: onlineServers) {
>       if(curr.getAddress().equals(server)) {
>         finalList.add(curr);
>       }
>     }
>   }
>   return finalList;
> }
> {code}
> filterServers is to return the union of servers and onlineServers. The 
> current implementation has time complexity as O(m * n) (2 loops), could be in 
> O(m + n) if HashSet is used. The trade-off is space complexity is increased.
> Another point which could be improved: filterServers() is only called in 
> filterOfflineServers(). filterOfflineServers calls filterServers(Set, List). 
> The current filterServers(Collection, Collection) seems could be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to