Lukas Waldmann created HADOOP-14444:
---------------------------------------

             Summary: New implementation of ftp and sftp filesystems
                 Key: HADOOP-14444
                 URL: https://issues.apache.org/jira/browse/HADOOP-14444
             Project: Hadoop Common
          Issue Type: New Feature
          Components: fs
            Reporter: Lukas Waldmann


Current implementation of FTP and SFTP filesystems have severe limitations and 
performance issues when dealing with high number of files. Mine patch solve 
those issues and integrate both filesystems such a way that most of the core 
functionality is common for both and therefore simplifying the maintainability.

The core features:
* Support for HTTP/SOCKS proxies
* Support for passive FTP
* Support of connection pooling - new connection is not created for every 
single command but reused from the pool.
For huge number of files it shows order of magnitude performance improvement 
over not pooled connections.
* Caching of directory trees. For ftp you always need to list whole directory 
whenever you ask information about particular file.
Again for huge number of files it shows order of magnitude performance 
improvement over not cached connections.
* Support of keep alive (NOOP) messages to avoid connection drops
* Support for Unix style or regexp wildcard glob - useful for listing a 
particular files across whole directory tree
* Support for reestablishing broken ftp data transfers - can happen 
surprisingly often



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to