[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423129#comment-13423129 ]
Alejandro Abdelnur commented on HDFS-3513: ------------------------------------------ I've done some testing, accesing HttpFS via 'hadoop fs' and the difference with caching is significant when doing multiple short operations. Below an example doing a recursive LS (I've run the command multiple times in each configuration before taking a sample). For distcp usecases I don't think will make much difference once data is being streamed (which should account for most of the distcp time). But for integration from other systems that may do regular lookups I think there is merit in having FS caching in HttpFS. Also, the numbers below are without security enabled. I would presume that with security ON the lag will be greater. Without caching disabled: {code} $ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history drwxrwx--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate real 0m1.717s user 0m1.599s sys 0m0.105s {code} With caching enabled: {code} $ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history drwxrwx--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate real 0m0.879s user 0m1.471s sys 0m0.101s {code} > HttpFS should cache filesystems > ------------------------------- > > Key: HDFS-3513 > URL: https://issues.apache.org/jira/browse/HDFS-3513 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.0-alpha > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch > > > HttpFS opens and closes a FileSystem instance against the backend filesystem > (typically HDFS) on every request. The FileSystem caching is not used as it > does not have expiration/timeout and filesystem instances in there live > forever, for long running services like HttpFS this is not a good thing as it > would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira