[ 
https://issues.apache.org/jira/browse/HDFS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715471#comment-13715471
 ] 

Suresh Srinivas commented on HDFS-4979:
---------------------------------------

I have been discussing how to size the retry cache with [~jingzhao]. Here are 
our early thoughts. Please provide feedback.

>From 
>https://issues.apache.org/jira/browse/HDFS-4974?focusedCommentId=13715427&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13715427,
> we need to focus on the following operations that frequently go into retry 
>cache:
# ClientProtocol#create()
# ClientProtocol#delete()
# ClientProtocol#rename()

One way to model the use is, to take the throughput the namenode can handle and 
assume that the above operations are done at the max rate. This is the worst 
case scenario that we need to handle. However, I think the more realistic 
approach is to assume, say 1% of the file system namespace (at its full 
capacity) is receiving the above operations that is in the retry cache. Lets 
look at some example numbers (assuming 10 minute cache interval, that is 600 
seconds):
* A large namespace, say with 100 million files, would have retry cache 
configured with 1 million entries. This is equivalent 1 million/600 = 1666 
operations per second sustained for a period of 10 minutes.
# A small namespace, say with 1 million files, the retry cache will be 
configured with 10K entries. This is equivalent to 16 operations per second 
sustained for a period of 10 minutes.

Does this sound reasonable. Do we want to have some non linear calculation, to 
allow more operations for smaller namespaces?

The second question is, assume that the number of operations spikes up and we 
run out of retry cache space. What should be the choice of how to deal with it:
# Continue to add to retry cache (that is do not add max capacity for retry 
cache). This way client always sees retried requests handled correctly.
# Limit the number of retry cache entries to make sure the namenode continues 
to function. In this case the client might see its operations fail.

I and Jing are leaning towards the second option.

I will post in a subsequent comment the memory cost of the retry cache.
                
> Implement retry cache on the namenode
> -------------------------------------
>
>                 Key: HDFS-4979
>                 URL: https://issues.apache.org/jira/browse/HDFS-4979
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: HDFS-4979.10.patch, HDFS-4979.1.patch, 
> HDFS-4979.2.patch, HDFS-4979.3.patch, HDFS-4979.4.patch, HDFS-4979.5.patch, 
> HDFS-4979.6.patch, HDFS-4979.7.patch, HDFS-4979.8.patch, HDFS-4979.9.patch, 
> HDFS-4979.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to