[ 
https://issues.apache.org/jira/browse/HADOOP-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272430#comment-17272430
 ] 

Xiaoyu Yao commented on HADOOP-17079:
-------------------------------------

Thanks [~daryn] for the comments. Here are my thoughts on adding a new method 
for GroupCacheLoader#getGroupsSet. 

Many GroupMappingServiceProvider implementations have already used Set 
internally (e.g., LdapGroupsMapping#lookupGroup) or use additional step to 
dedup the list (e.g., ShellBasedUnixGroupsMapping). It is expensive to convert 
between Set and List back-and-forth with the the existing list-based 
getGroups() method in GroupMappingServiceProvider interface . 

Can you elaborate the proposal to change GroupCacheLoader#load? Can we avoid 
the two conversions?
Set -> List ((GroupMappingServiceProvider Impl)) 
and List->Set (GroupCacheLoader). 

> Optimize UGI#getGroups by adding UGI#getGroupsSet
> -------------------------------------------------
>
>                 Key: HADOOP-17079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17079
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: HADOOP-17079.002.patch, HADOOP-17079.003.patch, 
> HADOOP-17079.004.patch, HADOOP-17079.005.patch, HADOOP-17079.006.patch, 
> HADOOP-17079.007.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> UGI#getGroups has been optimized with HADOOP-13442 by avoiding the 
> List->Set->List conversion. However the returned list is not optimized to 
> contains lookup, especially the user's group membership list is huge 
> (thousands+) . This ticket is opened to add a UGI#getGroupsSet and use 
> Set#contains() instead of List#contains() to speed up large group look up 
> while minimize List->Set conversions in Groups#getGroups() call. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to