[ 
https://issues.apache.org/jira/browse/HADOOP-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616326#action_12616326
 ] 

Pete Wyckoff commented on HADOOP-3797:
--------------------------------------

I propose using a very simple ghashtable to store stat structs based on paths 
with a timeout of a few seconds. 

So, add:

1. GHashTable hash_table
2. struct cached_stat { stat elem, int valid_until };
3. on get_attr, lookup the cached_stat and if valid < now, use it. else replace 
it
4. on read_dir,  add each entry to the hash table

The only question is whether to periodically cleanup the table - a ghashtable 
grows and shrinks as needed.   It may pay in read_dir to look if the size is 
bigger than some threshold, to iterate and delete old entries. with a threshold 
of 256 MB, you'd be able to cache something on the order of 100s of thousands 
of file stat structures.




> FUSE module chokes on directories with lots (10,000+ or so) files
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3797
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3797
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>            Reporter: Pete Wyckoff
>
> For some reason, fuse is calling getattr for every file after doing a 
> readdir. The readdir supplies the same info so there's no reason for the 
> getattr calls (that I can see) and it does not do this for subdirectories.
> I don't know why it's doing this, so I sent an email to the fuse development 
> list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to