[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427286#comment-13427286 ] Hudson commented on HDFS-3513: -- Integrated in Hadoop-Hdfs-trunk #1123 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1123/]) HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368304 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.2.0-alpha Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427330#comment-13427330 ] Hudson commented on HDFS-3513: -- Integrated in Hadoop-Mapreduce-trunk #1155 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1155/]) HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368304 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.2.0-alpha Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426985#comment-13426985 ] Hudson commented on HDFS-3513: -- Integrated in Hadoop-Hdfs-trunk-Commit #2615 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2615/]) HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368304 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.2.0-alpha Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426988#comment-13426988 ] Hudson commented on HDFS-3513: -- Integrated in Hadoop-Common-trunk-Commit #2550 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2550/]) HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368304 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.2.0-alpha Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427027#comment-13427027 ] Hudson commented on HDFS-3513: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2568/]) HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368304 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.2.0-alpha Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424997#comment-13424997 ] Alejandro Abdelnur commented on HDFS-3513: -- Daryn, thxs for reviewing the patch. #1 as explained in the comment in the purgeIfIdle() method, entries from the cache are never removed, only the FileSystem instance is closed. #2 the cache is exclusively on per user, not on configuration. As all configurations are created in the same way, for a given user they are identical. The duping is only done because we are injecting a property and the passed configuration is not owned by the createFileSystem method. #3 I've given it a try by it complicates the purgeIfIdle logic, I'd prefer to leave it like it is. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425060#comment-13425060 ] Daryn Sharp commented on HDFS-3513: --- +1 I'm fine with it as-is. Thanks for the clarifications! HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424063#comment-13424063 ] Daryn Sharp commented on HDFS-3513: --- Fair enough, # It looks like there's a synchronization issue in {{closeFileSystem}} that can cause an NPE if the fs is purged by another thread between the contains and get? # The cache handling in {{createFileSystem}} looks error prone. It's taking a conf, but ignoring when looking up a cache entry, then duping it before getting the fs which again is going to ignore the conf if fs is already set? # Not a big deal, but I'd suggest commonizing {{release}} and {{purgeIfIdle}} for the former to call the latter to avoid copy-n-paste code. Just a suggestion. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423129#comment-13423129 ] Alejandro Abdelnur commented on HDFS-3513: -- I've done some testing, accesing HttpFS via 'hadoop fs' and the difference with caching is significant when doing multiple short operations. Below an example doing a recursive LS (I've run the command multiple times in each configuration before taking a sample). For distcp usecases I don't think will make much difference once data is being streamed (which should account for most of the distcp time). But for integration from other systems that may do regular lookups I think there is merit in having FS caching in HttpFS. Also, the numbers below are without security enabled. I would presume that with security ON the lag will be greater. Without caching disabled: {code} $ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history drwxrwx--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate real0m1.717s user0m1.599s sys 0m0.105s {code} With caching enabled: {code} $ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history drwxrwx--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate real0m0.879s user0m1.471s sys 0m0.101s {code} HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423181#comment-13423181 ] Daryn Sharp commented on HDFS-3513: --- Well, that is unexpectedly significant for just 4 listStatus operations... For the sake of discussion: Are perhaps http keep-alive connections not being used? A keep-alive connection should effectively achieve ugi caching too. To prevent starvation, you might have to do something like limit the number of requests/connection, with a low idle timeout for subsequent requests. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423197#comment-13423197 ] Alejandro Abdelnur commented on HDFS-3513: -- JDK HttpURLConnection does keep-alive by default. Also note that even if keep-alive would be OFF it would be off in both tests, the only difference here is the FileSystem caching. Not sure I get the point you are trying to make after your question. Would you please clarify? Thanks. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423227#comment-13423227 ] Daryn Sharp commented on HDFS-3513: --- Yes, {{HttpURLConnection}} on the client side will assume keep-alive (since that's default behavior for http/1.1) but I've seen namenode servlets always sending Connection: close in the response. If the server is disabling keep-alive then every request creates a new socket and a new ugi and a new filesystem, hence why the ugi caching would show such an improvement. If however a keep-alive connection is used, I would _expect_ the same ugi to be used for all the pipelined requests. However, maybe each request on a keep-alive gets a new ugi anyway... HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423295#comment-13423295 ] Alejandro Abdelnur commented on HDFS-3513: -- HttpFS uses tomcat which by default has keep-alive ON. but still, the only diff in my testing is enabling/disabling the FS caching. Regarding your suggestion of using the same UGi for pipelined request, this is not possible as in servlets you don't know you are being in a pipelined request and it would mean we would have to keep UGIs sitting around in HttpFS in a cache. I don't see the value added of that, plus if a the HttpFS client is doing different doAs, the same UGI won't work. Because of that, I think the current approach is more suited and simpler. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423518#comment-13423518 ] Hadoop QA commented on HDFS-3513: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538017/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2913//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2913//console This message is automatically generated. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417319#comment-13417319 ] Eli Collins commented on HDFS-3513: --- Tucu, given the proxy use case, how important is caching fs handles in HttpFS? Eg what's the expected speedup on a real-world distcp? HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412824#comment-13412824 ] Daryn Sharp commented on HDFS-3513: --- The hive jira has hilighted complexities and possible issues with trying to cache ugis/filesystems. Out of curiosity, have you benchmarked whether the ugi cache provides a significant benefit? HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411852#comment-13411852 ] Alejandro Abdelnur commented on HDFS-3513: -- Given the discussion on HIVE-3098, can we take this JIRA out of the parking lot? HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403973#comment-13403973 ] Daryn Sharp commented on HDFS-3513: --- We should probably wait for the similar discussion in HIVE-3098 to be resolved? HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403605#comment-13403605 ] Alejandro Abdelnur commented on HDFS-3513: -- So it seem that for now we are stuck with doing FS caching. Given that, are we good with the latest patch? HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403625#comment-13403625 ] Hadoop QA commented on HDFS-3513: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533919/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//console This message is automatically generated. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403631#comment-13403631 ] Alejandro Abdelnur commented on HDFS-3513: -- findbugs is wrong, the patch is checking for existence of the entry in te cache before attempting a release. I'll add this exclusion to findbugs as part of the commit. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291089#comment-13291089 ] Daryn Sharp commented on HDFS-3513: --- Would a more general, and possibly simpler solution, be to not disable the fs cache but instead track the last-use per user? After the expiration period, the user's cached fs instances may be closed. Would this address the comment about worst case the server will have a cached fs per user? That scenario will lead to a lot of leaked fds because {{DFSClient}} will have sockets in the TIME_WAIT state (see MAPREDUCE-4323). HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291123#comment-13291123 ] Alejandro Abdelnur commented on HDFS-3513: -- It is a bit more complicated than that, for some reason (if I recall correctly this was for some JT requirements) the FileSystem cache is per UGI but UGI uses object equality instead or username equality. so it you have 2 UGIs for the same user you end up with 2 FileSystem instances the FileSystem cache. (run into this problem with Oozie when hadoop security came to be and I was told that is the way it has to be). But going back to this patch, you won't have leaks, the entries in the fsCache in HttpFS remain in the cache by their FileSystem instances are closed after the time out. the worse case scenario I'm referring to, it is about the number of CachedFileSystem entires in the fsCache in HttpFS, this map that serves as cached is not being purged of entries, but the FileSystem instances in the entries are certainly closed after time out, thus no sockets TIME_WAIT (specially because with this cache HttpFS is quite aggressive on closing the FileSystem instances). Hope this clarifies. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291129#comment-13291129 ] Daryn Sharp commented on HDFS-3513: --- I should have thought of the UGI issue since it's biting us in other situations. On that tangent, do you recall why it's supposed to be like that, or is there a jira with a discussion? HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291138#comment-13291138 ] Devaraj Das commented on HDFS-3513: --- bq. the FileSystem cache is per UGI but UGI uses object equality instead or username equality https://issues.apache.org/jira/browse/HADOOP-6670 has the discussion on the equals implementation change. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290336#comment-13290336 ] Hadoop QA commented on HDFS-3513: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531134/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//console This message is automatically generated. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3513) HttpFS should cache filesystems
[ https://issues.apache.org/jira/browse/HDFS-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290373#comment-13290373 ] Hadoop QA commented on HDFS-3513: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531139/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs: org.apache.hadoop.test.TestHTestCase +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//console This message is automatically generated. HttpFS should cache filesystems --- Key: HDFS-3513 URL: https://issues.apache.org/jira/browse/HDFS-3513 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-3513.patch, HDFS-3513.patch HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira