[ 
https://issues.apache.org/jira/browse/HDFS-14297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772780#comment-16772780
 ] 

Tao Jie commented on HDFS-14297:
--------------------------------

Thank you [~xkrogen], {{getContentSummary}} is invoked from several peripheral 
systems, not only for monitoring quotas in our environment. We can replace 
{{getContentSummary}} by {{getQuotaUsage}} in some place. I still think we 
should do some improvement on server side. If we have a new user who call 
{{getContentSummary}} very frequently, it will cause a lot of load to namenode 
rpc server

> Add cache for getContentSummary() result
> ----------------------------------------
>
>                 Key: HDFS-14297
>                 URL: https://issues.apache.org/jira/browse/HDFS-14297
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Tao Jie
>            Priority: Major
>
> In a large HDFS cluster, calling {{getContentSummary}} for a directory with 
> large amount of files is very expensive. In a certain cluster with more than 
> 100 million files, calling {{getContentSummary}} may take more than 10s and 
> it will hold fsnamesystem lock for such a long time.
> In our cluster, there are several peripheral systems calling 
> {{getContentSummary}} periodically to monitor the status of dirs. Actually we 
> don't need the very accurate result in most cases. We could keep a cache for 
> those contentSummary result in namenode, with which we could avoid repeated 
> heavy request in a span. Also we should add more restrictions to  this cache: 
> 1,its size should be limited and it should be LRU, 2, only result of heavy 
> request would be  added to this cache, eg, rpctime over 1000ms.
> We may create a new RPC method or add a flag to the current method so that we 
> will not modify the current behavior and we can have a choose of a accurate 
> but expensive method or a fast but inaccurate method. 
> Any thought?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to