[ 
https://issues.apache.org/jira/browse/HDFS-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847602#action_12847602
 ] 

Konstantin Shvachko commented on HDFS-1043:
-------------------------------------------

I run {{NNThroughputBenchmark -op open}}. This opens a lot of files (100,000 - 
500,000) on the name-node. The name-node performs server-side use group 
resolution. In version 0.20.1 we used to pass the user group(s) along with the 
user name. The security branch (and trunk) use server-side UG resolution 
instead. In regular case for 0.20.100 most of resolutions will be done from the 
server-side cache. The actual unix shell group resolution will happen only if 
the entry is not cached or the cache expired. 
I run the benchmark in two variants in the first the cache is never refreshed, 
so user groups always come from the cache. In the second variant, clients 
frequently send requests to refresh cache, so the server actually resolves 
groups most of the time.
I also ran the benchmark with different number of threads (server handlers). 
The one-threaded (sequential) variant measures the actual overhead of 
server-side UG resoltion. The 100-thread variant is closer to what is used in 
real clusters.
The table below summarizes the results. The number units here are 
operations-per-second.
- UG cache resolution adds about 8% overhead per operation
- direct UG resolutions adds 34%. This should not happen often, and
- in the (real) concurrent world this only results in 8% overhead.
- An unexpected result is that cache turns out to be inefficient when accessed 
concurrently. I verified this many times, the numbers vary, but getting cached 
values is always slower than direct resolution. This is not expected, and 
should be address in future optimizations.

||Version||1 thread (ops/sec)||100 threads (ops/sec)||
|0.20.1 no server-side UG resolution |48638|67676|
|0.20.100 use UG cache|44581 (-8%)|53418 (-18%)|
|0.20.100 direct UG resolution|31869 (-34%)|62500 (-8%)|


> Benchmark overhead of server-side group resolution of users
> -----------------------------------------------------------
>
>                 Key: HDFS-1043
>                 URL: https://issues.apache.org/jira/browse/HDFS-1043
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: benchmarks
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.22.0
>
>         Attachments: UGCRefresh.patch
>
>
> Server-side user group resolution was introduced in HADOOP-4656. 
> The benchmark should repeatedly request the name-node for user group 
> resolution, and reset NN's user group cache periodically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to