[
https://issues.apache.org/jira/browse/HADOOP-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616326#action_12616326
]
Pete Wyckoff commented on HADOOP-3797:
--------------------------------------
I propose using a very simple ghashtable to store stat structs based on paths
with a timeout of a few seconds.
So, add:
1. GHashTable hash_table
2. struct cached_stat { stat elem, int valid_until };
3. on get_attr, lookup the cached_stat and if valid < now, use it. else replace
it
4. on read_dir, add each entry to the hash table
The only question is whether to periodically cleanup the table - a ghashtable
grows and shrinks as needed. It may pay in read_dir to look if the size is
bigger than some threshold, to iterate and delete old entries. with a threshold
of 256 MB, you'd be able to cache something on the order of 100s of thousands
of file stat structures.
> FUSE module chokes on directories with lots (10,000+ or so) files
> -----------------------------------------------------------------
>
> Key: HADOOP-3797
> URL: https://issues.apache.org/jira/browse/HADOOP-3797
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/fuse-dfs
> Reporter: Pete Wyckoff
>
> For some reason, fuse is calling getattr for every file after doing a
> readdir. The readdir supplies the same info so there's no reason for the
> getattr calls (that I can see) and it does not do this for subdirectories.
> I don't know why it's doing this, so I sent an email to the fuse development
> list.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.