[ 
https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423129#comment-13423129
 ] 

Alejandro Abdelnur commented on HDFS-3513:
------------------------------------------

I've done some testing, accesing HttpFS via 'hadoop fs' and the difference with 
caching is significant when doing multiple short operations. Below an example 
doing a recursive  LS (I've run the command multiple times in each 
configuration before taking a sample).

For distcp usecases I don't think will make much difference once data is being 
streamed (which should account for most of the distcp time). But for 
integration from other systems that may do regular lookups I think there is 
merit in having FS caching in HttpFS.

Also, the numbers below are without security enabled. I would presume that with 
security ON the lag will be greater.

Without caching disabled:

{code}
$ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging/history
drwxrwx---   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging/history/done
drwxrwxrwt   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging/history/done_intermediate

real    0m1.717s
user    0m1.599s
sys     0m0.105s
{code}

With caching enabled:

{code}
$ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging
drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging/history
drwxrwx---   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging/history/done
drwxrwxrwt   - tucu supergroup          0 2012-07-26 07:47 
/tmp/hadoop-yarn/staging/history/done_intermediate

real    0m0.879s
user    0m1.471s
sys     0m0.101s
{code}
                
> HttpFS should cache filesystems
> -------------------------------
>
>                 Key: HDFS-3513
>                 URL: https://issues.apache.org/jira/browse/HDFS-3513
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch
>
>
> HttpFS opens and closes a FileSystem instance against the backend filesystem 
> (typically HDFS) on every request. The FileSystem caching is not used as it 
> does not have expiration/timeout and filesystem instances in there live 
> forever, for long running services like HttpFS this is not a good thing as it 
> would keep connections open to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to