[Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15511 ) Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table. .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5632/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986 Gerrit-Change-Number: 15511 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 05:18:39 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15511 ) Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table. .. Patch Set 5: (1 comment) Patch Set 5 submitted. What is different from Patch Set 4 is that now we can do single pass insert both in HashTable::Insert and HashTable::Iterator::SetTuple. Right now, this patch is hijacking the FLAGS_enable_quadratic_probing to enable robin hood mode rather than the quadratic probing mode. I plan to create separate flag for robin hood mode in next iteration, so that we can do performance comparison between linear, quadratic, and robin hood linear probe. http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@14 PS3, Line 14: If a hash table is configured as a robin hood hash table, the new > I'm still thinking how to do single pass insert. From what I understand, we This is now addressed in Patch Set 5. -- To view, visit http://gerrit.cloudera.org:8080/15511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986 Gerrit-Change-Number: 15511 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 04:48:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15511 to look at the new patch set (#5). Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table. .. WIP IMPALA-9434: Implement Robin Hood Hash Table. Robin hood hashing reduces the variances of probe lengths by continually rebalancing elements. This patch is the first step towards full robin hood hash table implementation by doing bucket rebalancing after every insert. If a hash table is configured as a robin hood hash table, the new element insertion will be buffered in a temporary bucket. This temporary bucket will then matched against existing bucket elements, swapped with a "rich" bucket, and continue doing so until it swap with an empty bucket. The PSL (probe sequence length) invariant is maintained in robin hood hash table. This allow us to add short-circuit in the Probe function to immediately returns when it finds out that the PSL of currently visited bucket is smaller/richer than the key that is being looked up, indicating that the key does not exist in the table. Instead of continue probing until next empty bucket is found, Probe can immediately the return the index of recently visited richer bucket to caller along with the not found flag. In turn, the caller use this returned index to specify in which index the new element should be inserted to. Testing: - Pass core tests Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/hash-table-benchmark.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exprs/scalar-expr.h 9 files changed, 560 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/15511/5 -- To view, visit http://gerrit.cloudera.org:8080/15511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986 Gerrit-Change-Number: 15511 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14197 ) Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5562/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 04:26:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14197 ) Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5631/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 04:25:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15552 ) Change subject: IMPALA-9401: primitive include-what-you-use script and mappings .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5561/ -- To view, visit http://gerrit.cloudera.org:8080/15552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367 Gerrit-Change-Number: 15552 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 04:03:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/14197 ) Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl File testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl: http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl@5 PS13, Line 5: # The flags below require unsafe flags to be unlocked. I realize I accidentally pushed this file. I will remove it in my next iteration based on reviews. http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl File testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl: http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl@7 PS13, Line 7: # The flags below require unsafe flags to be unlocked. I realize I accidentally pushed this file. I will remove it in my next iteration based on reviews. -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 03:54:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables
Grant Henke has uploaded a new patch set (#13) to the change originally created by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 ) Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables .. IMPALA-5092 Add support for VARCHAR in Kudu tables KUDU-1938 added VARCHAR column type support to Kudu. This commit adds support for Kudu's VARCHAR type to Impala. The length of a Kudu varchar is applied as a character length as opposed to a byte length like Impala currently uses. When writing data to Kudu, the VARCHAR length is not an issue because Impala only officially supports ASCII characters and those characters are the same size in bytes and characters. Additionally, extra bytes would be truncated by the Kudu client if somehow a value was too long. When reading data from Kudu, it is possible that the value written by some other application is wider in bytes than Impala expects and can handle. This can happen due to multi-byte UTF-8 characters. In that case, we adjust the length in Impala to truncate the extra bytes of the value. This isn’t a great solution, but one other integrations have taken as well given Impala doesn’t support UTF-8 values. IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR columns and marked the truncation code with a TODO that references that Jira. Testing: * Performed manual testing of standard DDL and DML interaction * Manually reproduced a check failure due to multi-byte characters and tested that length truncation resolve that issue. * Added/adjusted the following automated tests: ** AnalyzeDDLTest: CTAS into Kudu with varchar type ** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type ** kudu_create.test: Create table with VARCHAR column and key ** kudu_describe.test: Describe table with VARCHAR column and key ** kudu_insert.test: Insert with VARCHAR columns including null and non-null defaults ** kudu_update.test: Updates with VARCHAR column ** kudu_upsert.test: Upserts with VARCHAR column ** kudu_delete.test Deletes with VARCHAR columns Follow on work: - Add min-max runtime filter tests/support - Add predicate pushdown and partition tests Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test M tests/query_test/test_kudu.py 16 files changed, 776 insertions(+), 642 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/13 -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 13 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 17: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5560/ -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 02:47:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5630/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 02:21:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5629/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 02:16:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15573 ) Change subject: IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5628/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15573 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I313b84919883f4c6838748635d999a2bc12a59d3 Gerrit-Change-Number: 15573 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 01:48:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc File be/src/util/impalad-metrics.cc: http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc@360 PS5, Line 360: IO_MGR_REMOTE_DATA_CACHE_DROPPED_ENTRIES = IO_MGR_METRICS->AddCounter( Missed this indentation. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 01:40:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#6). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track failed inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-failed-inserts Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 257 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/6 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15306 ) Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456 PS21, Line 456: mem_tracker_->Consume(deferred_consumption_); > ahh because deferred_consumption_ can be negative, right? Exactly, it is deferred in both directions. It gets incorporated into the underlying memtracker when the absolute value is greater than the max deferred consumption. -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 21 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 01:37:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15306 ) Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. Patch Set 23: Code-Review+1 Rebased, carrying +1 -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 23 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 01:35:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#5). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track failed inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-failed-inserts Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 257 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/5 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25 PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count : impala-server.io-mgr.remote-data-cache-miss-count : impala-server.io-mgr.remote-data-cache-num-entries > how about hit and miss rates? if we just emit the counts, I guess something Yes, I'm thinking that there will be a separate mechanism to consume the counters and be able to perform rate analysis. Prometheus seems to have this capability: https://www.innoq.com/en/blog/prometheus-counters/ The histograms for writes, reads, and evictions keep counts of the number of events. So, we have that information in the WebUI. I'm less certain about how this is accessible through prometheus. We embed the count in what we send to prometheus: https://github.com/apache/impala/blob/master/be/src/util/histogram-metric.cc#L131 It sounds like this is exposed as its own metric by prometheus: https://prometheus.io/docs/practices/histograms/ I think that should provide counts for writes per partition. Since it is interesting to know when we don't do the inserts, I added two metrics in this upload for the two most interesting cases where we don't do the insert: impala-server.io-mgr.remote-data-cache-dropped-entries - this is the counter version of impala-server.io-mgr.remote-data-cache-dropped-bytes impala-server.io-mgr.remote-data-cache-failed-inserts - this tracks where the Cache::Insert() call fails. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257 PS3, Line 257: int32_t index_; > Maybe note this is used for differentiating metrics. I don't think that it Good point, these are strictly for naming metrics (and will be used in future for the access trace). The index makes no functional difference. Added a comment here. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549 PS3, Line 549: if (!TestInfo::is_test() > I find the control flow in this whole function kind of confusing. Maybe ins Good point, reworked the control flow. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876 PS3, Line 876: partition_idx++ > nit: ++partition_idx Done -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 01:35:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15570 ) Change subject: IMPALA-2563: Support LDAP search bind operations .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.h File be/src/rpc/authentication.h: http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.h@30 PS3, Line 30: #include "util/ldap-util.h" nit: we probably only need a forward declaration, so long as the constructor and destructor are declared in the .cc file. http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.h File be/src/util/ldap-util.h: http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.h@57 PS3, Line 57: set unordered_set? I didn't see a reason why ordering matters. http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc File be/src/util/ldap-util.cc: http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@121 PS3, Line 121: user_filter_ = Split(user_filter, ","); I guess this wouldn't allow us to specify a user or group with a comma in it, since no escaping, but it seems like the hive syntax is the same: https://github.com/apache/hive/blob/037eacea46371015a7f9894c5a9ccfb9708d5c56/common/src/java/org/apache/hive/common/util/HiveStringUtils.java http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@133 PS3, Line 133: // DN patterns and replacing any '%s' with each group name. It looks like we're replacing only the first %s, is that intended? http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@241 PS3, Line 241: int rc = ldap_search_ext_s(ld, group_dn.c_str(), LDAP_SCOPE_SUBTREE, filter.c_str(), Can you add a brief comment about what this invocation is doing? http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@264 PS3, Line 264: LOG(WARNING) << "Was not able to get DN from search result."; Is there any more context that would be useful to debug this issue? E.g. the group_dn? http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@268 PS3, Line 268: LOG(WARNING) << "Following of references is not currently supported, ignoring."; Same comment http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@281 PS3, Line 281: vector attributes = Split(rdn, delimiter::Limit(",", 1)); Do we need to do any unescaping here? It looks like the DN syntax uses \ for escaping - https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ldap/distinguished-names FWIW it looks like other pre-existing string manipulation in this file doesn't do escaping, so it may be a general limitation, but maybe if there's an ldap library function to do the parsing that would be more robust http://gerrit.cloudera.org:8080/#/c/15570/3/fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java File fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java: http://gerrit.cloudera.org:8080/#/c/15570/3/fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java@205 PS3, Line 205: // Run with user that passes the group filter but not the user filter, should fail. Also check if it fails both filters? -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Mar 2020 01:11:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current
Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15573 Change subject: IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current .. IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current The current dependency on Ozone pulls in a bunch of transitive dependencies that can pollute the Impala classpath. Ozone publishes a fat-jar that shades all client dependencies. Impala should use the fat-jar instead of depending on the hadoop-ozone-filesystem jar. Change-Id: I313b84919883f4c6838748635d999a2bc12a59d3 --- M fe/pom.xml 1 file changed, 1 insertion(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/15573/1 -- To view, visit http://gerrit.cloudera.org:8080/15573 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I313b84919883f4c6838748635d999a2bc12a59d3 Gerrit-Change-Number: 15573 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 17 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Sat, 28 Mar 2020 00:32:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. IMPALA-9466: impala-shell client retry for hs2-http protocol Added retries for idempotent rpcs: OpenSession, PingImpalaHS2Service, GetResultSetMetadata, CloseImpalaOperation (non dmls), CancelOperation, GetOperationStatus, GetRuntimeProfile, GetExecSummary, GetLog Retries were also added to the 'set all' query execution and subsequent result fetch in the ImpalaHS2Client._open_session() The retries are only supported for hs2-http protocol and enabled by default. At most there are 3 retries for a failed rpc. There is a sleep duration of 'n' seconds after nth retry. Only failed rpcs due to an error in the http transport are retried and if an rpc failed because the server returned an error in the rpc response then such scenarios are not retriable. Improved error diagnostics by dumping stack trace when ImpalaShell. _execute_stmt() gets an 'Unknown Exception'. Testing: - Added a custom_cluster test which injects fault into the http transport and checks expected behavior from the various rpcs. Some of these tests leave the session in an open state and so these tests are not suitable for the e2e test framework which have metric verifiers expecting related metrics to be 0 at the end of the test. - Manually tested real world scenarios with impala-shell client communicating with an impala coordinator via a fault injecting istio mesh. - Manually tested dropping connections on an nginx ingress gateway by sending SIGTERM to all worker processes. Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Reviewed-on: http://gerrit.cloudera.org:8080/15378 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M shell/impala_client.py M shell/impala_shell.py M shell/shell_exceptions.py A tests/custom_cluster/test_hs2_fault_injection.py 4 files changed, 505 insertions(+), 51 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 18 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15570 ) Change subject: IMPALA-2563: Support LDAP search bind operations .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5627/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:57:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15570 ) Change subject: IMPALA-2563: Support LDAP search bind operations .. Patch Set 3: > (1 comment) > > QQ before I review in depths so I understand the context. > > This change applies to LDAP auth with any endpoint right? I.e. if a > group is excluded by the filter, they can't connect to HS2 or the > webserver or any endpoint where ldap is enabled. > > For the webserver auth, does this mean that we'll need an > additional flag to control which groups or users can connect to the > webserver? Right so the intention here is that https://gerrit.cloudera.org/#/c/15538/ will add flags like --webserver_ldap_group_filter that will allow for doing things like restricting webserver authentication to an admin group. -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:35:38 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: That is great news if we can remove the ntp requirement for our dev cluster. -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:35:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15552 ) Change subject: IMPALA-9401: primitive include-what-you-use script and mappings .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367 Gerrit-Change-Number: 15552 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:34:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15552 ) Change subject: IMPALA-9401: primitive include-what-you-use script and mappings .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5561/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367 Gerrit-Change-Number: 15552 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:34:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15570 ) Change subject: IMPALA-2563: Support LDAP search bind operations .. Patch Set 3: (1 comment) QQ before I review in depths so I understand the context. This change applies to LDAP auth with any endpoint right? I.e. if a group is excluded by the filter, they can't connect to HS2 or the webserver or any endpoint where ldap is enabled. For the webserver auth, does this mean that we'll need an additional flag to control which groups or users can connect to the webserver? http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.cc@97 PS3, Line 97: athentication typo -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:33:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15569 ) Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions .. IMPALA-9560: Fix TestStatsExtrapolation for release versions When changing the Impala version from 3.4.0-SNAPSHOT to 3.4.0-RELEASE, TestStatsExtrapolation::test_stats_extrapolation started failing due to a difference in the expected cardinality (expected: 17.91K, actual 17.90K). This is because the Impala version gets embedded into parquet files, and this causes a slight difference in file size, which translates into a slight difference in expected cardinality. This modifies TestStatsExtrapolation::test_stats_extrapolation to allow any 17.9*K cardinality. Testing: - Tested on master and on branch-3.4.0 Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Reviewed-on: http://gerrit.cloudera.org:8080/15569 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/15569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Gerrit-Change-Number: 15569 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15569 ) Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Gerrit-Change-Number: 15569 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 23:27:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15570 to look at the new patch set (#3). Change subject: IMPALA-2563: Support LDAP search bind operations .. IMPALA-2563: Support LDAP search bind operations This patch adds a number of new options for controlling LDAP by restricting authentication to particular users and/or members of particular groups: --ldap_group_filter: comma separated list of authorized groups --ldap_user_filter: comma separated list of authorized users There are also options to control how LDAP is searched when applying these filters: --ldap_group_dn_pattern --ldap_group_membership_key --ldap_group_membership_class These options were modelled on equivalent options in Hive, see: https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2 https://github.com/apache/hive/tree/master/service/src/java/org/apache/hive/service/auth/ldap This patch also refactors LDAP related functionality into a utility class, both to make authentication.cc more manageable and to facilitate follow up work that will add LDAP authentication options for the webserver. Testing: - Added a FE custom cluster test that sets --ldap_group_filter and --ldap_user_filter and verifies expected behavior. Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c --- M be/src/common/global-flags.cc M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/util/CMakeLists.txt A be/src/util/ldap-util.cc A be/src/util/ldap-util.h M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java M fe/src/test/resources/users.ldif 8 files changed, 446 insertions(+), 166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15570/3 -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5626/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 22:59:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15570 ) Change subject: IMPALA-2563: Support LDAP search bind operations .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5625/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 22:39:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5560/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 22:18:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 16: Code-Review+2 (1 comment) carry http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h@94 PS12, Line 94: > nit: extra space Done -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 22:17:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Hello Sahil Takiar, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15454 to look at the new patch set (#16). Change subject: IMPALA-3766: optionally compress spilled data .. IMPALA-3766: optionally compress spilled data Enabled via --disk_spill_compression_codec, which uses the same syntax as the compression_codec query option. Recommended codecs are LZ4 and ZSTD. ZSTD supports specifying a compression level. The compression is done in TmpFileMgr using a temporary buffer. Allocation of disk space is reworked slightly so that the allocation can happen after compression. The default power-of-two disk block sizes would lead to a lot of internal fragmentation, so a new strategy for free space management, similar to that used in the data cache, can be used with --disk_spill_punch_holes=true. TmpFileMgr will allocate a range of the actual compressed size and punch holes in the file for each range that is no longer needed. UncompressedWriteIoBytes is added to the buffer pool profiles, so that you can see what degree of compression is achieved. Typically I saw ratios of 2-3x for LZ4 and ZSTD (with LZ4 toward the lower end and ZSTD toward the higher end). Limitations: The management of the compression buffer memory could be improved. Ideally it would be integrated with the buffer pool and use the buffer pool allocator instead of being done "on the side". We would probably want to do this before making this the default, for resource management and performance reasons (doing a malloc() directly does not use the caching supported by the buffer pool). Testing: * Run buffer pool spilling tests with different combinations of the new options. * Extend existing TmpFileMgr tests for file space allocation to run with hole punching enabled. * Switch a couple of spilling tests to use the new option. * Add a metrics test to check for scratch leaks. * Enable the new options by default for end-to-end dockerized tests to get additional coverage. * Add a unit test where allocating compression memory fails, both on the read and write path. * Ran a single-node stress test on TPC-DS SF 1 and TPC-H SF 10 The peak compression buffer usage was ~40MB. Perf: I ran this spilling query using an SSD as the scratch disk: set mem_limit=200m; select count(distinct l_partkey) from tpch30_parquet.lineitem; The time taken for the second run of each query was: No compression: 19.59s LZ4: 18.56s ZSTD: 20.59s Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 --- M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/query-options.cc M be/src/util/parse-util.cc M be/src/util/parse-util.h M bin/jenkins/dockerized-impala-run-tests.sh M tests/custom_cluster/test_scratch_disk.py M tests/verifiers/metric_verifier.py 15 files changed, 849 insertions(+), 246 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/15454/16 -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 22:18:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15570 to look at the new patch set (#2). Change subject: IMPALA-2563: Support LDAP search bind operations .. IMPALA-2563: Support LDAP search bind operations This patch adds a number of new options for controlling LDAP by restricting authentication to particular users and/or members of particular groups: --ldap_group_filter: comma separated list of authorized groups --ldap_user_filter: comma separated list of authorized users There are also options to control how LDAP is searched when applying these filters: --ldap_group_dn_pattern --ldap_group_membership_key --ldap_group_membership_class These options were modelled on equivalent options in Hive, see: https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2 https://github.com/apache/hive/tree/master/service/src/java/org/apache/hive/service/auth/ldap This patch also refactors LDAP related functionality into a utility class, both to make authentication.cc more manageable and to facilitate follow up work that will add LDAP authentication options for the webserver. Testing: - Added a FE custom cluster test that sets --ldap_group_filter and --ldap_user_filter and verifies expected behavior. Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c --- M be/src/common/global-flags.cc M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/util/CMakeLists.txt A be/src/util/ldap-util.cc A be/src/util/ldap-util.h M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java M fe/src/test/resources/users.ldif 8 files changed, 446 insertions(+), 166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15570/2 -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/15552 ) Change subject: IMPALA-9401: primitive include-what-you-use script and mappings .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/15552/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15552/1//COMMIT_MSG@12 PS1, Line 12: built > Done Done http://gerrit.cloudera.org:8080/#/c/15552/1/bin/iwyu/iwyu.sh File bin/iwyu/iwyu.sh: http://gerrit.cloudera.org:8080/#/c/15552/1/bin/iwyu/iwyu.sh@31 PS1, Line 31: echo "" > Done Done -- To view, visit http://gerrit.cloudera.org:8080/15552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367 Gerrit-Change-Number: 15552 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 21:48:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 20:33:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 15: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/bufferpool/buffer-pool.cc File be/src/runtime/bufferpool/buffer-pool.cc: http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/bufferpool/buffer-pool.cc@736 PS11, Line 736: [this, page]( > It wasn't an intentional change, but I think it is an edge case that doesn' wfm -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 21:34:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 17: One of the custom cluster tests failed because an impalad didn't startup in time. Seems unrelated to this patch. Re-running the pre-commit job to see if we can get a clean run to merge this. If it fails again, I'll just merge it myself. -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 17 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 20:27:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15306 ) Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5624/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 22 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 21:14:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15306 to look at the new patch set (#22). Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. IMPALA-8690: Add LIRS cache eviction algorithm One concern for the data cache is that the LRU eviction algorithm is suceptible to being flushed by large scans of low priority data. This implements the LIRS algorithm described in "LIRS: An Efficient Low Inter-reference Recency Set Replacement Policy to Improve Buffer Cache Performance" by Song Jiang / Xiaodon Xhang 2002. LIRS is a scan-resistent eviction algorithm with low performance penalty to LRU. This introduces the startup flag data_cache_eviction_policy to control which eviction policy to use. The only two options are LRU and LIRS, with the default continuing to be LRU. To accomodate the new algorithm and associated tests, some code moved around: 1. The RLCacheShard implementation moved from util/cache/cache.cc to util/cache/rl-cache.cc. 2. The backend cache tests were split into multiple files. util/cache/cache-test.h contains shared cache testing code. util/cache/cache-test.cc contains generic tests that should work for any algorithm. util/cache/rl-cache-test.cc are RLCacheShard specific tests util/cache/lirs-cache-test.cc are LIRS specific tests 3. To make it easy for clients of the cache code to customize the cache eviction algorithm, the public interface changed from using a template to taking the policy as an argument. 4. Cache::MemoryType is removed. 5. Cache adds an Init() method to verify the validity of startup flags Testing: - Added LIRS specific backend cache tests (lirs-cache-test) - Ran TPC-DS with a very small cache and concurrency to test corner cases with the LIRS eviction policy - Parameterized data-cache-test to run for both LRU and LIRS - Added LIRS equivalents for tests in custom_cluster/test_data_cache.py - Ran cache-bench with LRU and LIRS. The results are: Test case | Algorithm | Lookups / sec | Hit rate ZIPFIAN ratio=1.00x | LRU | 11.31M| 99.9% ZIPFIAN ratio=1.00x | LIRS | 10.09M| 99.8% ZIPFIAN ratio=3.00x | LRU | 11.36M| 95.9% ZIPFIAN ratio=3.00x | LIRS | 9.27M| 96.4% UNIFORM ratio=1.00x | LRU | 7.46M| 99.8% UNIFORM ratio=1.00x | LIRS | 6.93M| 99.8% UNIFORM ratio=3.00x | LRU | 5.63M| 33.3% UNIFORM ratio=3.00x | LIRS | 3.24M| 33.3% The takeaway is that LIRS is a bit slower on lookups and quite a bit slower on inserts. However, they both are still doing millions of operations per second, so it should not be a bottleneck for the data cache. Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/util/cache/CMakeLists.txt M be/src/util/cache/cache-bench.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache-test.cc A be/src/util/cache/cache-test.h M be/src/util/cache/cache.cc M be/src/util/cache/cache.h A be/src/util/cache/lirs-cache-test.cc A be/src/util/cache/lirs-cache.cc A be/src/util/cache/rl-cache-test.cc A be/src/util/cache/rl-cache.cc M bin/rat_exclude_files.txt M tests/custom_cluster/test_data_cache.py 16 files changed, 2,678 insertions(+), 844 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/15306/22 -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 22 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 15: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h@94 PS12, Line 94: nit: extra space http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@835 PS11, Line 835: LOG(WARNING) << "Failed to compress, couldn't create compressor: " > That's a good point. I just switched to calling .Release() to be symmetrica It's probably fine not to test this since it is an edge case unlikely to happen. -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 20:23:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5559/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 17 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 20:26:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 17 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 20:26:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15306 ) Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. Patch Set 21: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456 PS21, Line 456: mem_tracker_->Consume(deferred_consumption_); > I will add a comment here. ahh because deferred_consumption_ can be negative, right? -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 21 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 20:02:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 16: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5556/ -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 16 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 20:07:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (1 comment) Mostly a high level comment about what metrics to include, code itself looks good. http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25 PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count : impala-server.io-mgr.remote-data-cache-miss-count : impala-server.io-mgr.remote-data-cache-num-entries how about hit and miss rates? if we just emit the counts, I guess something like Grafana would be able to derive the hit / miss rates, right? is that the plan? not sure what the LLAP-Grafana dashboard looks like, but might be worth taking a look at what they have? for the miss-count, should there be an additional metric that helps differentiate between cache misses and cache writes? e.g. for LIRS you can have a cache miss, but the data won't necessarily be written to the cache. so maybe we can add a 'remote-data-cache-write-count' to track how many times a miss actually ends up writing data to the cache? I stole some ideas from the Guava cache metrics FYI: https://guava.dev/releases/22.0/api/docs/com/google/common/cache/CacheStats.html -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 19:59:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15567 ) Change subject: IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes .. IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes When introducing Kudu date support, test_column_storage_attributes was modified to add the date datatype. The test expects the date to be represented in python as datetime.date. Instead, the date is a string in python, so the test consistently fails. This changes the test so that it expects a python string, and the test now passes. Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c Reviewed-on: http://gerrit.cloudera.org:8080/15567 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/query_test/test_kudu.py 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c Gerrit-Change-Number: 15567 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Volodymyr Verovkin
[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15567 ) Change subject: IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c Gerrit-Change-Number: 15567 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Volodymyr Verovkin Gerrit-Comment-Date: Fri, 27 Mar 2020 20:00:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257 PS3, Line 257: int32_t index_; Maybe note this is used for differentiating metrics. I don't think that it has any other relationship to a meaningful value or property of the cache? http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549 PS3, Line 549: if (!TestInfo::is_test() I find the control flow in this whole function kind of confusing. Maybe instead do something like if (is_test() && FindMetricForTesting(PATH_METRIC) != nullptr) { // Metrics already inited. return; } and then below you can just have the metrics created like normal without any more is_test or == nullptr checks. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876 PS3, Line 876: partition_idx++ nit: ++partition_idx -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 19:52:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15306 ) Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. Patch Set 21: (5 comments) Working on a new upload (which will add comments). Going ahead and posting replies. http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@131 PS21, Line 131: entires > nit: typo Will fix http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456 PS21, Line 456: mem_tracker_->Consume(deferred_consumption_); > shouldn't all the memory be release when the shard is destroyed? I will add a comment here. The eviction/free code in CleanupThreadState() is calling UpdateMemTracker() as it goes, which decrements the underlying memtracker each time it exceeds max_deferred_consumption. The most that can be left over at this point is the deferred_consumption. http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@861 PS21, Line 861: e->set_resident(); > are we suppose to set that data as resident even if 'success' is false? The failure condition is like being successful and then evicting it immediately. So, we do all the same preparation up front. The contract for a caller is that the data needs to resident before we call this. The caller ran Allocate() and got an UNINITIALIZED handle with the key filled in. Caller put the data somewhere (like the data cache file) and then filled in the value on the handle. Now it is calling Insert(), and the data is already resident. If the Insert fails, it calls the eviction callback, which will make this not resident (and also decrement the memtracker below). I will add a comment here to clarify the success/failure case and how it works. http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@871 PS21, Line 871: // 3. There is a non-tombstone entry (rare) > is this possible? i guess this could happen if two Inserts of the same data Yes, this is possible in a race condition on insert. It is also possible for the data cache when we are inserting a larger version of an existing entry (i.e. file=X offset=0 length=5 is getting replaced by file=X offset=0 length=10). http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@876 PS21, Line 876: UpdateMemTracker(e->charge()); > not sure I actually understand what this is doing, so perhaps a misplaced c See other comment on residency. This is the same between success/failure, because eviction will decrement this. -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 21 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 19:39:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: (1 comment) > I can make those other clean up changes. Mind if I do it in a > follow on patch? Sure. Doesn't seem like a lot of work so putting it in this patch would be nice, but up to you. http://gerrit.cloudera.org:8080/#/c/15568/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15568/2//COMMIT_MSG@7 PS2, Line 7: [test] Please file an IMPALA jira for this. If you split it up into two patches, you can use the same one, just mark the second review as (part 2) -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 19:36:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 19:30:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15569 ) Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5558/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Gerrit-Change-Number: 15569 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 18:58:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15569 ) Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Gerrit-Change-Number: 15569 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 18:53:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15570 ) Change subject: IMPALA-2563: Support LDAP search bind operations .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5623/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 18:34:11 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: I can make those other clean up changes. Mind if I do it in a follow on patch? -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 18:23:54 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: yes, we should be able to remove all ntp related steps. -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 18:22:46 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: Does this change mean that we would no longer need to call ntp-wait before starting the Kudu daemons? see Impala/testdata/cluster/admin That's been an occasional source of difficulty/flakiness in different testing environments for us, so it would be nice to be able to eliminate it. -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 18:20:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15569 ) Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5622/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Gerrit-Change-Number: 15569 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 27 Mar 2020 18:19:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15570 Change subject: IMPALA-2563: Support LDAP search bind operations .. IMPALA-2563: Support LDAP search bind operations This patch adds a number of new options for controlling LDAP by restricting authentication to particular users and/or members of particular groups: --ldap_group_filter: comma separated list of authorized groups --ldap_user_filter: comma separated list of authorized users There are also options to control how LDAP is searched when applying these filters: --ldap_group_dn_pattern --ldap_group_membership_key --ldap_group_membership_class These options were modelled on equivalent options in Hive, see: https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2 https://github.com/apache/hive/tree/master/service/src/java/org/apache/hive/service/auth/ldap This patch also refactors LDAP related functionality into a utility class, both to make authentication.cc more manageable and to facilitate follow up work that will add LDAP authentication options for the webserver. Testing: - Added a FE custom cluster test that sets --ldap_group_filter and --ldap_user_filter and verifies expected behavior. Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c --- M be/src/common/global-flags.cc M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/util/CMakeLists.txt A be/src/util/ldap-util.cc A be/src/util/ldap-util.h M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java M fe/src/test/resources/users.ldif 8 files changed, 446 insertions(+), 166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15570/1 -- To view, visit http://gerrit.cloudera.org:8080/15570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c Gerrit-Change-Number: 15570 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4080: Codegen once per fragment
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15408 ) Change subject: IMPALA-4080: Codegen once per fragment .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/15408/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15408/8//COMMIT_MSG@26 PS8, Line 26: TODO: This can be updated right? -- To view, visit http://gerrit.cloudera.org:8080/15408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aef8bc621f96caafe9a1c378617a2987e4ad452 Gerrit-Change-Number: 15408 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 17:46:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4080: Codegen once per fragment
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15408 ) Change subject: IMPALA-4080: Codegen once per fragment .. Patch Set 8: (5 comments) Looking good! I have only minor comments. It looks like there are quite a few TODOs for comments, so I'm assuming you're going to get back to those. http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.h File be/src/runtime/fragment-state.h: http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.h@42 PS8, Line 42: // TODO: add desc Missed this TODO http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.h@168 PS8, Line 168: /// TODO: add description Same here http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.cc File be/src/runtime/fragment-state.cc: http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.cc@84 PS8, Line 84: query_state_->ErrorDuringFragmentCodegen(codegen_status_); It would be good to do this outside of the critical section so that we don't need to worry about lock ordering between codegen_lock_ and QueryState::status_lock_ http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/query-state.cc@548 PS8, Line 548: { nit: extra braces for lock scope not needed http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h File be/src/udf/udf-internal.h: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h@174 PS7, Line 174: static int GetConstFnAttr(bool uses_decimal_v2, > I decided to just pass this bool here (just renamed it to uses_decimal_v2) ok sounds fine -- To view, visit http://gerrit.cloudera.org:8080/15408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aef8bc621f96caafe9a1c378617a2987e4ad452 Gerrit-Change-Number: 15408 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 17:45:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15569 Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions .. IMPALA-9560: Fix TestStatsExtrapolation for release versions When changing the Impala version from 3.4.0-SNAPSHOT to 3.4.0-RELEASE, TestStatsExtrapolation::test_stats_extrapolation started failing due to a difference in the expected cardinality (expected: 17.91K, actual 17.90K). This is because the Impala version gets embedded into parquet files, and this causes a slight difference in file size, which translates into a slight difference in expected cardinality. This modifies TestStatsExtrapolation::test_stats_extrapolation to allow any 17.9*K cardinality. Testing: - Tested on master and on branch-3.4.0 Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 --- M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/15569/1 -- To view, visit http://gerrit.cloudera.org:8080/15569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94 Gerrit-Change-Number: 15569 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15306 ) Change subject: IMPALA-8690: Add LIRS cache eviction algorithm .. Patch Set 21: (5 comments) http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@131 PS21, Line 131: entires nit: typo http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456 PS21, Line 456: mem_tracker_->Consume(deferred_consumption_); shouldn't all the memory be release when the shard is destroyed? http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@861 PS21, Line 861: e->set_resident(); are we suppose to set that data as resident even if 'success' is false? http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@871 PS21, Line 871: // 3. There is a non-tombstone entry (rare) is this possible? i guess this could happen if two Inserts of the same data happen concurrently? http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@876 PS21, Line 876: UpdateMemTracker(e->charge()); not sure I actually understand what this is doing, so perhaps a misplaced comment, but what happens if 'success' is false? -- To view, visit http://gerrit.cloudera.org:8080/15306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8 Gerrit-Change-Number: 15306 Gerrit-PatchSet: 21 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 17:31:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15564 ) Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589 .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e Gerrit-Change-Number: 15564 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 17:15:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15564 ) Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589 .. IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589 With HIVE-22589 Hive3 switched back to using Julian Calendar for historical dates by default which caused an Impala test failure around Avro DATE values. Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e Reviewed-on: http://gerrit.cloudera.org:8080/15564 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M testdata/workloads/functional-query/queries/QueryTest/avro_date.test M tests/query_test/test_date_queries.py 2 files changed, 33 insertions(+), 21 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e Gerrit-Change-Number: 15564 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5621/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 17:08:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87 PS8, Line 87: DCHECK(ValidateFullAcidFileSchema().ok()); // Should have already been validated. It would be nice if this printed the validation error. Maybe we need a DCHECK_OK macro, like EXPECT_OK? http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test File testdata/workloads/functional-query/queries/QueryTest/describe-path.test: http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143 PS5, Line 143: describe functional_orc_def.alltypes > Yeah, I wasn't sure about this, but it was handy during development. That's fine, I think there's some similar behaviour with nested types if you describe nested fields that are not explicitly listed, e.g. you can refer to a nested item with the .item member or implicitly by omitting it. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5620/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:57:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8005: Randomize partitioning exchanges.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15497 ) Change subject: IMPALA-8005: Randomize partitioning exchanges. .. IMPALA-8005: Randomize partitioning exchanges. Currently, we use the same hash seed for partitioning exchanges at the sender. For a table with skew in distribution in the shuffling keys, multiple queries using the same shuffling keys for exchanges will end up hashing to the same destination fragments running on a particular host and potentially overloading that host. This patch seeds the hash with query id. This will ensure that the partitioning exchanges do not always hash to the same destination with same shuffling keys. Testing: Added a test to data-stream-test to verify the data values at destination are different for different queries. Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 Reviewed-on: http://gerrit.cloudera.org:8080/15497 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h 3 files changed, 93 insertions(+), 10 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/15497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 Gerrit-Change-Number: 15497 Gerrit-PatchSet: 6 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (7 comments) PS8 is a rebase. http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h File be/src/exec/orc-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h@51 PS6, Line 51: private: > These could be const I think, since they're only set in the constructor Done http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231 PS3, Line 231: return Status(Substitute( > I think it would be good to validate that the schema is as expected, just s Turned it into a validator function that returns a Status, calling it from HdfsOrcScanner::Open(). http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@323 PS6, Line 323: table.getMetaStoreTable().getParameters()); > nit: indentation is a bit off Done http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@329 PS6, Line 329: Preconditions.checkState(col.getName().equals("row__id")); > nit: multi-line if should use parentheses. Maybe this would be better to re Done, also converted the 'equals("row__id")' check to a Precondition. http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test File testdata/workloads/functional-query/queries/QueryTest/describe-path.test: http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143 PS5, Line 143: describe functional_orc_def.alltypes > I'm not sure if making this visible is a good idea. What does hive do? Yeah, I wasn't sure about this, but it was handy during development. Hive doesn't reveal the hidden fields in DESCRIBE, so I switched to hiding them too. However, when explicitly asked for it, currently I still allow describe on the row__id field, i.e.: DESCRIBE .row__id; +-++-+ | name| type | comment | +-++-+ | operation | int| | | originaltransaction | bigint | | | bucket | int| | | rowid | bigint | | | currenttransaction | bigint | | +-++-+ I think it's not wrong since we also allow the user to explicitly query the fields of the row__id column. Hive only allows DESCRIBE on tables. http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@197 PS5, Line 197: > flake8: E501 line too long (91 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@283 PS5, Line 283: > flake8: E261 at least two spaces before inline comment Done -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:29:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#8). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test M test
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py@321 PS8, Line 321: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@306 PS8, Line 306: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:27:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15395 to look at the new patch set (#7). Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. IMPALA-9042: Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.*' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: * specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/datasets/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/describe-path.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test A testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test M test
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py@321 PS7, Line 321: ' flake8: E129 visually indented line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py@306 PS7, Line 306: n flake8: E129 visually indented line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Mar 2020 16:15:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15567 ) Change subject: IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c Gerrit-Change-Number: 15567 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Volodymyr Verovkin Gerrit-Comment-Date: Fri, 27 Mar 2020 15:29:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15567 ) Change subject: IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5557/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c Gerrit-Change-Number: 15567 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Volodymyr Verovkin Gerrit-Comment-Date: Fri, 27 Mar 2020 15:29:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 16 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 15:19:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 ) Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol .. Patch Set 16: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5556/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 16 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 15:19:32 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun// DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 14:56:45 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5619/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 14:55:00 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15568 ) Change subject: [test] Use `system_unsync` time for Kudu test clusters .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Mar 2020 14:43:08 + Gerrit-HasComments: No
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Hello Thomas Tauber-Marshall, Alexey Serbin, Attila Bukor, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15568 to look at the new patch set (#2). Change subject: [test] Use `system_unsync` time for Kudu test clusters .. [test] Use `system_unsync` time for Kudu test clusters Recently Kudu made enhancements to time source configuration and adjusted the time source for it’s local clusters/tests to `system_unsync`. This patch mirrors that behavior in Impala test clusters given there is no need to require NTP-synchronized clock for a test where all the participating Kudu masters and tablet servers are run at the same node using the same local wallclock. See the Kudu commit here for details: https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 --- M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl 2 files changed, 16 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/15568/2 -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15568 Change subject: [test] Use `system_unsync` time for Kudu test clusters .. [test] Use `system_unsync` time for Kudu test clusters Recently Kudu made enhancements to time source configuration and adjusted the time source for it’s local clusters/tests to `system_unsync`. This patch mirrors that behavior in Impala test clusters given there is no need to require NTP-synchronized clock for a test where all the participating Kudu masters and tablet servers are run at the same node using the same local wallclock. See the Kudu commit here for details: https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 --- M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl 2 files changed, 16 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/15568/1 -- To view, visit http://gerrit.cloudera.org:8080/15568 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Gerrit-Change-Number: 15568 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke
[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15564 ) Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589 .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5554/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e Gerrit-Change-Number: 15564 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 12:41:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15564 ) Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e Gerrit-Change-Number: 15564 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 12:41:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/15564 ) Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589 .. Patch Set 1: > >. the test is skipped for ORC (not sure if this is on purpose or > by accident). > My guess is that updating this test was forgotten in the quite > recent https://gerrit.cloudera.org/#/c/14982/ > > I think that in the ideal case we should test both: Julian to test > that invalid dates are handled properly (this probably has to be > file format specific, as error messages are different) and > Gregorian to have a more extended suite of tests that can run on > more file formats. > > The change itself looks good to me, but I am worried about the back > and forth changes in Hive. Thanks for the review. Let's merge this in now to unblock the core test suite. I agree that DATE testing across different fileformats and Hive versions is pretty messy now and we don't cover all the different scenarios. There's a lot of room for improvement, but we should address that in a separate patch-set. -- To view, visit http://gerrit.cloudera.org:8080/15564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e Gerrit-Change-Number: 15564 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 27 Mar 2020 12:39:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE SENTRY to true
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15214 ) Change subject: IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE_SENTRY to true .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15214 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9690a926953a8d3c3872277680b4be0551546c68 Gerrit-Change-Number: 15214 Gerrit-PatchSet: 14 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 27 Mar 2020 12:22:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE SENTRY to true
Csaba Ringhofer has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15214 ) Change subject: IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE_SENTRY to true .. IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE_SENTRY to true This patch bumps up the version of Guava libraries from 14.0.1 to 28.1-jre. Due to some changes in Guava's API's, we modify the call sites accordingly. Moreover, in order to instruct the Java classes under the directory of $IMPALA_HOME/common/yarn-extras to use the new Guava libraries, we explicitly added a dependency in the corresponding pom.xml file. On the other hand, we set DISABLE_SENTRY to true regardless of $USE_CDP_HIVE since Sentry's Guava version has not been bumped up yet and thus run-sentry-service.sh cannot be successfully executed. Recall that by setting DISABLE_SENTRY to true we also disable every Sentry-related test, which is fine from now on since Impala 3.4 was recently branched. The plan is to drop support for Sentry in the Impala 4 line. Change-Id: I9690a926953a8d3c3872277680b4be0551546c68 Reviewed-on: http://gerrit.cloudera.org:8080/15214 Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer --- M bin/impala-config.sh M common/yarn-extras/pom.xml M common/yarn-extras/src/main/java/org/apache/impala/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java M impala-parent/pom.xml 48 files changed, 103 insertions(+), 94 deletions(-) Approvals: Impala Public Jenkins: Verified Csaba Ringhofer: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/15214 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9690a926953a8d3c3872277680b4be0551546c68 Gerrit-Change-Number: 15214 Gerrit-PatchSet: 15 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: V
[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15567 ) Change subject: IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c Gerrit-Change-Number: 15567 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Volodymyr Verovkin Gerrit-Comment-Date: Fri, 27 Mar 2020 11:33:49 + Gerrit-HasComments: No