[ https://issues.apache.org/jira/browse/HADOOP-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272366#comment-17272366 ]
Daryn Sharp commented on HADOOP-17079: -------------------------------------- I seriously question the design decision to break an interface and then redundantly modify every single group provider impl to return a Set with the additional _implicit_ expectation that providers _must_ return a LinkedHashSet (so primary group works). Why was it not sufficient to just modify the GroupCacheLoader#load to convert the list to a set for the cache? As evidenced by the JNI netgroup bug, I would not be surprised if this large & seemingly unnecessary change has added more subtle bugs. I suggest reverting all the group provider impl changes. > Optimize UGI#getGroups by adding UGI#getGroupsSet > ------------------------------------------------- > > Key: HADOOP-17079 > URL: https://issues.apache.org/jira/browse/HADOOP-17079 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Xiaoyu Yao > Assignee: Xiaoyu Yao > Priority: Major > Fix For: 3.4.0 > > Attachments: HADOOP-17079.002.patch, HADOOP-17079.003.patch, > HADOOP-17079.004.patch, HADOOP-17079.005.patch, HADOOP-17079.006.patch, > HADOOP-17079.007.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > UGI#getGroups has been optimized with HADOOP-13442 by avoiding the > List->Set->List conversion. However the returned list is not optimized to > contains lookup, especially the user's group membership list is huge > (thousands+) . This ticket is opened to add a UGI#getGroupsSet and use > Set#contains() instead of List#contains() to speed up large group look up > while minimize List->Set conversions in Groups#getGroups() call. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org