[ 
https://issues.apache.org/jira/browse/HDDS-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192703#comment-17192703
 ] 

Rakesh Radhakrishnan commented on HDDS-4222:
--------------------------------------------

Hi [~linyiqun], I have moved the lookup discussion to this jira task as this 
requires more detailed and focussed discussion. Thanks a lot for the help!

Please refer to [~linyiqun]'s original comment
{quote}Here the directory cache is used for avoid the additional look up 
overheads. Latest design of directory cache hasn't been attached but just some 
thoughts from me:
 Two type mapping cache will be useful I think:

<KeyName, KeyInfo>, like </vol1/buck1/a/b/c/d/file1, KeyInfo>, so that we can 
skip the traverse search from dir table to key table.
 <DirName, List<KeyInfo>>, this is used for the listStatus scenario, list files 
call can be a very expensive call under Ozone fs namespace.
 Cache introduced here can speed up the metadata access but also there are two 
aspects we need to consider.
{quote}
Yes, directory cache is most useful during the path component traversal. IMHIO, 
this would be the first candidate to target and would greatly help to get 
maximum performance benefit during path look ups. That doesn't meant that other 
entities like files, list etc is not important. I believe it depends on many 
factors like, workloads, hardware (RAM, NVMe)capabilities, how much is the 
metadata proportion(dirs, files) in FS namespace, directory hierarchy etc. To 
begin with, I am planning to implement a cache framework where OM will provide 
facility to plugin different cache entities based on user requirements. Here, 
based on the tradeoffs user can add more built-in cache policies and configure 
it and tune it accordingly.
{quote}Cache entry eviction policy for this, we cannot cache all the dir/file 
entries.
 Consistency between dir cache and underlying store. Cache entry will become 
stale when db store updated but not synced in corresponding cache entry. The 
cache refresh interval time can be introduced here. Only when the cache entry 
not updated more than given refresh interval, then we trigger update cache 
entry from querying the db table. Users can set different refresh interval time 
to ensure the cache freshness based on their scenarios. Also they can disable 
this cache by set interval to 0 that means each query will directly access to 
db.
 Current OM table cache seems not very helpful for dir cache so I came up with 
above thoughts.
{quote}
Yes, OM table cache is not helpful. I completely agree with you that the cache 
eviction policy is very important to manage the useful entries within the cache 
capacity. In the attached document, I proposed an optimized directory 
cache(Approach-3) with minimal data to incorporate more entires that benefits 
the path component traversal.

For the consistency part, this is a very good point and will take care during 
the implementation phase. I was thinking to update the cache during write and 
read paths to avoid additional cache refresh cycle. But, I don't have concrete 
thoughts on this and need to look into the OM code to do more deeper analysis.

> [OzoneFS optimization] Provide a mechanism for efficient path lookup
> --------------------------------------------------------------------
>
>                 Key: HDDS-4222
>                 URL: https://issues.apache.org/jira/browse/HDDS-4222
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>            Reporter: Rakesh Radhakrishnan
>            Assignee: Rakesh Radhakrishnan
>            Priority: Major
>         Attachments: Ozone FS Optimizations - Efficient Lookup using cache.pdf
>
>
> With the new file system HDDS-2939 like semantics design it requires multiple 
> DB lookups to traverse the path component in top-down fashion. This task to 
> discuss use cases and proposals to reduce the performance penalties during 
> path lookups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to