[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 22:12:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. IMPALA-8428: Add support for caching file handles on s3 This patch is based on work done by Joe McDonnell. This change adds support for cacheing file handles from S3. It add a new configuration flag 'cache_s3_file_handles' (set to true by default) which controls whether or not cacheing of S3 file handles is enabled. The S3 file handle cache is dependent on HADOOP-14747 (S3AInputStream to implement CanUnbuffer). HADOOP-14747 adds support for hdfsUnbufferFile to S3A streams. The call to unbuffer closes the underlying S3 object stream. Without this change the S3 file handle cache would quickly cause an impalad to crash because all S3 file handles in the cache would have a dangling HTTP(S) connection open to S3. Testing: * Modified test_hdfs_fd_caching.py so it is enabled for S3 as well as remote HDFS * Ran core tests * Ran TPC-DS on a real cluster and validated that the S3 file handle cache works as expected * Ran several test queries on a real cluster with S3Guard enabled and validated that the S3 file handle cache works as expected Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Reviewed-on: http://gerrit.cloudera.org:8080/13221 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/scan-range.cc M tests/custom_cluster/test_hdfs_fd_caching.py 3 files changed, 11 insertions(+), 8 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4235/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 16:51:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 4: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 16:50:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 3: I saw a couple of hung jobs like that too. I think it was probably the maven download that timed out, but I'm not 100% sure. -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 07 May 2019 16:32:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 3: Not really sure what happened to the Jenkins job. It failed because the following jobs timed-out: https://jenkins.impala.io/job/clang-tidy-ub1604/6748/, https://jenkins.impala.io/job/all-build-options-ub1604/3794/ Unfortunately, looking at the Jenkins log, looks like the jobs were hung: 21:52:10 + bin/run_clang_tidy.sh 00:49:36 Build timed out (after 180 minutes). Marking the build as aborted. and 22:10:12 [100%] Built target expr-benchmark 01:49:35 Build timed out (after 240 minutes). Marking the build as aborted. Subsequent runs of both jobs succeeded so I'm thinking this was due to some intermittent Jenkins issue. -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 07 May 2019 14:09:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4161/ -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 07 May 2019 02:42:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 06 May 2019 21:42:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4161/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 06 May 2019 21:42:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 2: Code-Review+2 Thanks for putting this together -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 May 2019 18:37:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 May 2019 16:46:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3057/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 May 2019 15:21:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 1: (1 comment) Addressed comments. Updated the commit message to mention the dependency on HADOOP-14747. http://gerrit.cloudera.org:8080/#/c/13221/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13221/1/be/src/runtime/io/disk-io-mgr.cc@135 PS1, Line 135: This is enabled by : // default. > I don't think this bit is necessary since the code has 'true' below Done -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 May 2019 14:28:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Hello Todd Lipcon, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13221 to look at the new patch set (#2). Change subject: IMPALA-8428: Add support for caching file handles on s3 .. IMPALA-8428: Add support for caching file handles on s3 This patch is based on work done by Joe McDonnell. This change adds support for cacheing file handles from S3. It add a new configuration flag 'cache_s3_file_handles' (set to true by default) which controls whether or not cacheing of S3 file handles is enabled. The S3 file handle cache is dependent on HADOOP-14747 (S3AInputStream to implement CanUnbuffer). HADOOP-14747 adds support for hdfsUnbufferFile to S3A streams. The call to unbuffer closes the underlying S3 object stream. Without this change the S3 file handle cache would quickly cause an impalad to crash because all S3 file handles in the cache would have a dangling HTTP(S) connection open to S3. Testing: * Modified test_hdfs_fd_caching.py so it is enabled for S3 as well as remote HDFS * Ran core tests * Ran TPC-DS on a real cluster and validated that the S3 file handle cache works as expected * Ran several test queries on a real cluster with S3Guard enabled and validated that the S3 file handle cache works as expected Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/scan-range.cc M tests/custom_cluster/test_hdfs_fd_caching.py 3 files changed, 11 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/13221/2 -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 1: (1 comment) Does this depend on a particular HADOOP patch being present so that unbuffer works properly? Might be good to note that in the commit message in case someone tries to backport this to a branch on an earlier Hadoop. http://gerrit.cloudera.org:8080/#/c/13221/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13221/1/be/src/runtime/io/disk-io-mgr.cc@135 PS1, Line 135: This is enabled by : // default. I don't think this bit is necessary since the code has 'true' below -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 May 2019 05:39:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13221 ) Change subject: IMPALA-8428: Add support for caching file handles on s3 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3050/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Fri, 03 May 2019 03:22:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8428: Add support for caching file handles on s3
Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13221 Change subject: IMPALA-8428: Add support for caching file handles on s3 .. IMPALA-8428: Add support for caching file handles on s3 This patch is based on work done by Joe McDonnell. This change adds support for cacheing file handles from S3. It add a new configuration flag 'cache_s3_file_handles' (set to true by default) which controls whether or not cacheing of S3 file handles is enabled. Testing: * Modified test_hdfs_fd_caching.py so it is enabled for S3 as well as remote HDFS * Ran core tests * Ran TPC-DS on a real cluster and validated that the S3 file handle cache works as expected * Ran several test queries on a real cluster with S3Guard enabled and validated that the S3 file handle cache works as expected Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/scan-range.cc M tests/custom_cluster/test_hdfs_fd_caching.py 3 files changed, 11 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/13221/1 -- To view, visit http://gerrit.cloudera.org:8080/13221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Gerrit-Change-Number: 13221 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar