[
https://issues.apache.org/jira/browse/HADOOP-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marc-Olivier Fleury updated HADOOP-4682:
----------------------------------------
Summary: Improve dfs_getattr running time (was: Improve ddfs_getattr
running time)
> Improve dfs_getattr running time
> --------------------------------
>
> Key: HADOOP-4682
> URL: https://issues.apache.org/jira/browse/HADOOP-4682
> Project: Hadoop Core
> Issue Type: Improvement
> Affects Versions: 0.20.0
> Reporter: Marc-Olivier Fleury
>
> As explained in issue HADOOP-3797, stat takes a long time to execute.
> I got a clearer idea of the time needed when testing a c program that needed
> to crawl a directory tree, that contains 10s of directories and 100K files.
> The original version used stat() to make the difference between files an
> folders. It needed about 1h to complete. I corrected it to use dirent.d_type,
> which provides the same information and is available at no extra cost when
> using readdir. The execution time changed to 2-3 mins.
> I tried to do other benchmarks using ls with or without color, and on the
> local file system, I got a speedup of 1.3, while on hdfs, the speedup was of
> 5.7. This means (very roughly) that calling stat with fuse is 5.7/1.3 = 4.4
> times slower.
> When using application that rely on stat to work correctly (there is
> sometimes no other way to make the difference between a file and a folder),
> this can be a major source of delay. The application I am working on needs to
> stat about 30'000 files; a faster stat() function would save me hours (per
> task).
> I am sure that I am not the only one who would appreciate a speedup, so I
> suppose this issue should be put into consideration.
> I do not know if the bottleneck is the call to hdfsGetPathInfo or to
> doConnectAsUser, but if it comes from doConnectAsUser, some improvements can
> surely be made.
> And in the worst case, caching might help, as suggested in HADOOP-3797.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.