[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence
Hello Gabor Kaszab, Zoltan Borok-Nagy, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21333 to look at the new patch set (#2). Change subject: IMPALA-13016: Fix ambiguous row_regex that check for no-existence .. IMPALA-13016: Fix ambiguous row_regex that check for no-existence There are few row_regex patterns used in EE test files that are ambiguous on whether a pattern does not exist in all parts of the results/runtime profile or at least one row does not have that pattern. These were caught by grepping the following pattern: $ git grep -n "row_regex: (?\!" This patch replaces them with either with !row_regex or VERIFY_IS_NOT_IN comment. Testing: - Run and pass modified tests. Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da --- M testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test M testdata/workloads/functional-query/queries/QueryTest/acid-truncate.test M testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-directed-mode.test 4 files changed, 7 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/21333/2 -- To view, visit http://gerrit.cloudera.org:8080/21333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da Gerrit-Change-Number: 21333 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/21333 ) Change subject: IMPALA-13016: Fix ambiguous row_regex that check for no-existence .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da Gerrit-Change-Number: 21333 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Apr 2024 00:23:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21186 ) Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15957/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 17 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 19 Apr 2024 00:06:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Hello k.venureddy2...@gmail.com, Sai Hemanth Gantasala, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21186 to look at the new patch set (#17). Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. IMPALA-12933: Avoid fetching unneccessary events of unwanted types There are several places where catalogd will fetch all events of a specific type on a table. E.g. in TableLoader#load(), if the table has an old createEventId, catalogd will fetch all CREATE_TABLE events after that createEventId on the table. Fetching the list of events is expensive since the filtering is done on client side, i.e. catalogd fetches all events and filter them locally based on the event type and table name. This could take hours if there are lots of events (e.g 1M) in HMS. This patch sets the eventTypeSkipList with the complement set of the wanted type. So the get_next_notification RPC can filter out some events on HMS side. To avoid bringing too much computation overhead to HMS's underlying RDBMS in evaluating predicates of EVENT_TYPE != 'xxx', rare event types (e.g. DROP_ISCHEMA) are not added in the list. A new flag, common_hms_event_types, is added to specify the common HMS event types. Once HIVE-28146 is resolved, we can set the wanted types directly in the HMS RPC and this approach can be simplified. UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT are the most common unused events for Impala. They are also added to the default skip list. A new flag, default_skipped_hms_event_types, is added to configure this list. This patch also fixes an issue that events of the non-default catalog are not filtered out. In a local perf test, I generated 100K RELOAD events after creating a table in Hive. Then use the table in Impala to trigger metadata loading on it which will fetch the latest CREATE_TABLE event by polling all events after the last known CREATE_TABLE event. Before this patch, fetching the events takes 1s779ms. Now it takes only 395.377ms. Note that in prod env, the event messages are usually larger, we could have a larger speedup. Tests: - Added an FE test - Ran CORE tests Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 15 files changed, 320 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/21186/17 -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 17 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10557/ -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 23:27:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21304 ) Change subject: IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15955/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Gerrit-Change-Number: 21304 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Apr 2024 22:54:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15956/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 22:55:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15954/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 20 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 22:44:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15953/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 22:44:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21277 to look at the new patch set (#13). Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. IMPALA-12988: Calculate an unbounded version of CpuAsk Planner calculates CpuAsk through a recursive call beginning at Planner.computeBlockingAwareCores(), which is called after Planner.computeEffectiveParallelism(). It does blocking operator analysis over the selected degree of parallelism that was decided during computeEffectiveParallelism() traversal. That selected degree of parallelism, however, is already bounded by min and max parallelism config, derived from PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. This patch calculates an unbounded version of CpuAsk that is not bounded by min and max parallelism config. It is purely based on the fragment's ProcessingCost and query plan relationship constraint (for example, the number of JOIN BUILDER fragments should equal the number of destination JOIN fragments for partitioned join). Frontend will receive both bounded and unbounded CpuAsk values from TQueryExecRequest on each executor group set selection round. The unbounded CpuAsk is then scaled down once using a nth root based sublinear-function, controlled by the total cpu count of the smallest executor group set and the bounded CpuAsk number. Another linear scaling is then applied on both bounded and unbounded CpuAsk using QUERY_CPU_COUNT_DIVISOR option. Frontend then compare the unbounded CpuAsk after scaling against CpuMax to avoid assigning a query to a small executor group set too soon. The last executor group set stays as the "catch-all" executor group set. After this patch, the "max-parallelism" fields in the query plan will all be set with maximum parallelism based on ProcessingCost. The CpuAsk counter is changed to shows the unbounded CpuAsk after scaling. A new counter CpuAskBounded shows the bounded CpuAsk after scaling. If QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting strategy is selected, this CpuAskBounded is also the minimum total admission slots give to the query. The EffectiveParallelism counter remains unchanged, showing bounded CpuAsk before scaling. Testing: - Update and pass FE test TpcdsCpuCostPlannerTest and PlannerTest#testProcessingCost. - Pass EE test tests/query_test/test_tpcds_queries.py - Pass custom cluster test tests/custom_cluster/test_executor_groups.py Change-Id: I5441e31088f90761062af35862be4ce09d116923 --- M be/src/scheduling/scheduler.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test M
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 21: Code-Review+2 All changes after patch set 19 are rebase adjustment. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 21 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 22:34:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/21304 ) Change subject: IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables .. IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. It fixes the race condition for the caching of SQL DataSource objects by using new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. java.sql.Connection.close() is not effectively to remove a closed connection from connection pool, which causes JDBC handler threads to wait available connections from connection pool for long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Notes that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in a following patch. Testing: - Passed core-test. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a --- M be/src/service/frontend.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java M fe/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java A fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DataSourceObjectCache.java M fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java M fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java M fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/bin/create-load-data.sh A testdata/bin/create-tpc-jdbc-tables.py A testdata/datasets/tpcds/tpcds_jdbc_schema_template.sql A testdata/datasets/tpch/tpch_jdbc_schema_template.sql M tests/query_test/test_tpcds_queries.py M tests/query_test/test_tpch_queries.py 16 files changed, 1,788 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21304/4 -- To view, visit http://gerrit.cloudera.org:8080/21304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Gerrit-Change-Number: 21304 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@26 PS11, Line 26: nth root based > Update to "nth root based" or something similar to be more accurate? Done http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@35 PS11, Line 35: field > Yes, fragment level parallelism. Will change to 'fields'. Done http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@36 PS11, Line 36: st. The CpuAsk : counter is changed to shows the unbounded CpuAsk after scaling. A new : counter CpuAskBounded shows the bounded CpuAsk after scaling. If : QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting strategy is : selected, this > Should pick the unbounded CpuAsk after scaling. Will fix the code and commi Done http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@85 PS11, Line 85: } else { > I can leave it unassigned for this branch. Done -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 22:33:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21277 to look at the new patch set (#12). Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. IMPALA-12988: Calculate an unbounded version of CpuAsk Planner calculates CpuAsk through a recursive call beginning at Planner.computeBlockingAwareCores(), which is called after Planner.computeEffectiveParallelism(). It does blocking operator analysis over the selected degree of parallelism that was decided during computeEffectiveParallelism() traversal. That selected degree of parallelism, however, is already bounded by min and max parallelism config, derived from PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. This patch calculates an unbounded version of CpuAsk that is not bounded by min and max parallelism config. It is purely based on the fragment's ProcessingCost and query plan relationship constraint (for example, the number of JOIN BUILDER fragments should equal the number of destination JOIN fragments for partitioned join). Frontend will receive both bounded and unbounded CpuAsk values from TQueryExecRequest on each executor group set selection round. The unbounded CpuAsk is then scaled down once using a square-root-based sublinear-function, controlled by the total cpu count of the smallest executor group set and the bounded CpuAsk number. Another linear scaling is then applied on both bounded and unbounded CpuAsk using QUERY_CPU_COUNT_DIVISOR option. Frontend then picks the maximum between bounded CpuAsk and unbounded CpuAsk numbers to avoid assigning a query to a small executor group set too soon. The last executor group set stays as the "catch-all" executor group set. After this patch, the "max-parallelism" field in the query plan will all be set with maximum parallelism based on ProcessingCost. The CpuAsk counter is changed to show the unbounded CpuAsk after scaling. A new counter CpuAskBounded shows the bounded CpuAsk after scaling. The EffectiveParallelism counter remains unchanged, showing bounded CpuAsk before scaling. Testing: - Update and pass FE test TpcdsCpuCostPlannerTest and PlannerTest#testProcessingCost. - Pass EE test tests/query_test/test_tpcds_queries.py - Pass custom cluster test tests/custom_cluster/test_executor_groups.py Change-Id: I5441e31088f90761062af35862be4ce09d116923 --- M be/src/scheduling/scheduler.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has uploaded a new patch set (#20) to the change originally created by David Rorke. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator This patch improves the accuracy of the CPU ProcessingCost estimates for several of the CPU intensive operators by basing the costs on benchmark data. The general approach for a given operator was to run a set of queries that exercised the operator under various conditions (e.g. large vs small row sizes and row counts, varying NDV, different file formats, etc) and capture the CPU time spent per unit of work (the unit of work might be measured as some number of rows, some number of bytes, some number of predicates evaluated, or some combination of these). The data was then analyzed in an attempt to fit a simple model that would allow us to predict CPU consumption of a given operator based on information available at planning time. For example, the CPU ProcessingCost for a Parquet scan is estimated as: TotalCost = (0.0144 * BytesMaterialized) + (0.0281 * Rows * Predicate Count) The coefficients (0.0144 and 0.0281) are derived from benchmarking scans under a variety of conditions. Similar cost functions and coefficients were derived for all of the benchmarked operators. The coefficients for all the operators are normalized such that a single unit of cost equates to roughly 100 nanoseconds of CPU time on a r5d.4xlarge instance. So we would predict an operator with a cost of 10,000,000 would complete in roughly one second on a single core. Limitations: * Costing only addresses CPU time spent and doesn't account for any IO or other wait time. * Benchmarking scenarios didn't provide comprehensive coverage of the full range of data types, distributions, etc. More thorough benchmarking could improve the costing estimates further. * This initial patch only covers a subset of the operators, focusing on those that are most common and most CPU intensive. Specifically the following operators are covered by this patch. All others continue to use the previous ProcessingCost code: AggregationNode DataStreamSink (exchange sender) ExchangeNode HashJoinNode HdfsScanNode HdfsTableSink NestedLoopJoinNode SortNode UnionNode Benchmark-based costing of the remaining operators will be covered by a future patch. Future patches will automate the collection and analysis of the benchmark data and the computation of the cost coefficients to simplify maintenance of the costing as performance changes over time. Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca --- M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test M
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/21277/12/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/21277/12/fe/src/main/java/org/apache/impala/service/Frontend.java@2394 PS12, Line 2394: verdict + " (require=" + scaledCpuAskUnbounded + ", max=" + availableCores + ")"); line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 22:22:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. IMPALA-12980: Translate CpuAsk into admission control slots Impala has a concept of "admission control slots" - the amount of parallelism that should be allowed on an Impala daemon. This defaults to the number of processors per executor and can be overridden with -–admission_control_slots flag. Admission control slot accounting is described in IMPALA-8998. It computes 'slots_to_use' for each backend based on the maximum number of instances of any fragment on that backend. This can lead to slot underestimation and query overadmission. For example, assume an executor node with 48 CPU cores and configured with -–admission_control_slots=48. It is assigned 4 non-blocking query fragments, each has 12 instances scheduled in this executor. IMPALA-8998 algorithm will request the max instance (12) slots rather than the sum of all non-blocking fragment instances (48). With the 36 remaining slots free, the executor can still admit another fragment from a different query but will potentially have CPU contention with the one that is currently running. When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk number that represents the cpu requirement of that query over a particular executor group set. This number is an estimation of the largest number of query fragment instances that can run in parallel without waiting, given by the blocking operator analysis. Therefore, the fragment trace that sums into that CpuAsk number can be translated into 'slots_to_use' as well, which will be a closer resemblance of maximum parallel execution of fragment instances. This patch adds a new query option called SLOT_COUNT_STRATEGY to control which admission control slot accounting to use. There are two possible values: - LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998. This is still the default value for the SLOT_COUNT_STRATEGY option. - PLANNER_CPU_ASK, which will follow the fragment trace that contributes towards CpuAsk number. This strategy will schedule more or equal admission control slots than the LARGEST_FRAGMENT strategy. To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that contribute to CpuAsk as dominant fragments. It also passes max_slot_per_executor information that it knows about the executor group set to the scheduler. AvgAdmissionSlotsPerExecutor counter is added to describe what Planner thinks the average 'slots_to_use' per backend will be, which follows this formula: AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors) Actual 'slots_to_use' in each backend may differ than AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under each executor profile node. Testing: - Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion. - Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost. - Add EE test test_processing_cost.py. - Add FE test PlannerTest#testProcessingCostPlanAdmissionSlots. Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Reviewed-on: http://gerrit.cloudera.org:8080/21257 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/CoreCount.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test A testdata/workloads/functional-query/queries/QueryTest/processing-cost-admission-slots.test M tests/custom_cluster/test_executor_groups.py A tests/query_test/test_processing_cost.py M tests/query_test/test_tpcds_queries.py 21 files changed, 1,505 insertions(+), 111 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 18 Gerrit-Owner: Riza
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 21:58:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15952/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 21:46:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12938: add-opens for platform.cgroupv1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21334 ) Change subject: IMPALA-12938: add-opens for platform.cgroupv1 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15951/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82 Gerrit-Change-Number: 21334 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 21:25:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/21302/2/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/21302/2/be/src/service/workload-management.cc@116 PS2, Line 116: ble by generati > Yes, that's why I changed them. I guess I can do that instead. According to Done http://gerrit.cloudera.org:8080/#/c/21302/2/common/thrift/SystemTables.thrift File common/thrift/SystemTables.thrift: http://gerrit.cloudera.org:8080/#/c/21302/2/common/thrift/SystemTables.thrift@23 PS2, Line 23: CLUSTER_ID : QUERY_ID > Just going to go back to unassigned. There's a DCHECK that asserts these ar Done http://gerrit.cloudera.org:8080/#/c/21302/2/fe/src/main/java/org/apache/impala/catalog/SystemTable.java File fe/src/main/java/org/apache/impala/catalog/SystemTable.java: http://gerrit.cloudera.org:8080/#/c/21302/2/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59 PS2, Line 59: TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString()); > I don't think so. I was looking at DataSourceTable for this pattern. Added to CatalogObjects.thrift. http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java File fe/src/main/java/org/apache/impala/catalog/SystemTable.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59 PS4, Line 59: TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString()); > Yeah, I think const string with TBL_PROP_ prefix is better. A property key Done -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 21:20:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Hello Andrew Sherman, Quanlong Huang, Riza Suminto, Jason Fehr, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21302 to look at the new patch set (#6). Change subject: IMPALA-13005: Create Query Live table in HMS .. IMPALA-13005: Create Query Live table in HMS Creates the 'sys.impala_query_live' table in HMS using a similar 'CREATE TABLE' command to 'sys.impala_query_log'. Updates frontend to identify a System Table based on the '__IMPALA_SYSTEM_TABLE' property. Tables improperly marked with '__IMPALA_SYSTEM_TABLE' will error when attempting to scan them because no relevant scanner will be available. Creating the table in HMS simplifies supporting 'SHOW CREATE TABLE' and 'DESCRIBE EXTENDED', so allows them for parity with Query Log. Explicitly disables 'COMPUTE STATS' on system tables as it doesn't work correctly. Updates workload management implementation to rely more on SystemTables.thrift definition, and adds DCHECKs to verify completeness and ordering. Testing: - adds additional test cases for changes to introspection commands - passes existing test_query_live and test_query_log suites Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/exec/system-table-scanner.cc M be/src/service/workload-management-fields.cc M be/src/service/workload-management.cc M be/src/service/workload-management.h M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java A fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/SystemTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/test/java/org/apache/impala/catalog/SystemTableTest.java M tests/custom_cluster/test_query_live.py 17 files changed, 243 insertions(+), 228 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/21302/6 -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12938: add-opens for platform.cgroupv1
Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21334 Change subject: IMPALA-12938: add-opens for platform.cgroupv1 .. IMPALA-12938: add-opens for platform.cgroupv1 Adds '--add-opens=jdk.internal.platform.cgroupv1' for Java 11 with ehcache, covering Impala daemons and frontend tests. Fixes InaccessibleObjectException detected by test_banned_log_messages.py. Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82 --- M be/src/common/init.cc M bin/run-all-tests.sh 2 files changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/21334/1 -- To view, visit http://gerrit.cloudera.org:8080/21334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82 Gerrit-Change-Number: 21334 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@35 PS11, Line 35: field > plural - 'fields'? This is referring to fragment level parallelism, right? Yes, fragment level parallelism. Will change to 'fields'. http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@36 PS11, Line 36: The CpuAsk : counter is changed to show the unbounded CpuAsk after scaling. A new : counter CpuAskBounded shows the bounded CpuAsk after scaling. The : EffectiveParallelism counter remains unchanged, showing bounded CpuAsk : before scaling. > This is a little confusing. Should pick the unbounded CpuAsk after scaling. Will fix the code and commit message. http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@85 PS11, Line 85: topNode = fragment.getPlanRoot(); > topNode not really being used? I can leave it unassigned for this branch. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 20:25:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@26 PS11, Line 26: square-root-based Update to "nth root based" or something similar to be more accurate? http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@35 PS11, Line 35: field plural - 'fields'? This is referring to fragment level parallelism, right? http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@36 PS11, Line 36: The CpuAsk : counter is changed to show the unbounded CpuAsk after scaling. A new : counter CpuAskBounded shows the bounded CpuAsk after scaling. The : EffectiveParallelism counter remains unchanged, showing bounded CpuAsk : before scaling. This is a little confusing. The Fragment instance count are still based on CpuAsk - bounded or unbounded and before or after scaling? Trying to figure what we use for computing admission_slots. http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@85 PS11, Line 85: topNode = fragment.getPlanRoot(); topNode not really being used? -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 20:12:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest The issue is that the code previously used a std::string_view to hold the data which is actually returned by rapidjson::Document. However, the rapidjson::Document object gets destroyed after creating the std::string_view. This meant the std::string_view referenced memory that was no longer valid, leading to a heap-use-after-free error. This patch fixes this issue by modifying the function to return a std::string instead of a std::string_view. When the function returns a string, it creates a copy of the data from rapidjson::Document. This ensures the returned string has its own memory allocation and doesn't rely on the destroyed rapidjson::Document. Tests: Reran the asan build and passed. Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Reviewed-on: http://gerrit.cloudera.org:8080/21315 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/ai-functions-ir.cc M be/src/exprs/ai-functions.h M be/src/exprs/ai-functions.inline.h M be/src/exprs/expr-test.cc 4 files changed, 11 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 18:58:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint This patch adds support to display the HA status of catalog and statestore on the root web page. The status will be presented as "Catalog Status: Active" or "Statestore Status: Standby" based on the values retrieved from the metrics catalogd-server.active-status and statestore.active-status. If the catalog or statestore is standalone, it will show active as the status, which is same as the metric. Tests: Ran core tests. Manually tests the web page, and verified the status display is correct. Also checked the situation when the failover happens, the current 'standby' status can be changed to 'active'. Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Reviewed-on: http://gerrit.cloudera.org:8080/21294 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/common/daemon-env.h M be/src/util/default-path-handlers.cc M be/src/util/default-path-handlers.h M www/root.tmpl 4 files changed, 69 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 18:30:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21333 ) Change subject: IMPALA-13016: Fix ambiguous row_regex that check for no-existence .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15950/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da Gerrit-Change-Number: 21333 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 18:29:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21186 ) Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. Patch Set 16: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 16 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 18 Apr 2024 18:25:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15949/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 18:18:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java File fe/src/main/java/org/apache/impala/catalog/SystemTable.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59 PS4, Line 59: TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString()); > Moved this property to CatalogObjects.thrift. I forgot about that as a simp Yeah, I think const string with TBL_PROP_ prefix is better. A property key may have dot or other char that is not valid as Thrift identifier. We have few of those const string already. $ git grep -n "const string" common/thrift/ common/thrift/CatalogService.thrift:44:const string CATALOG_TOPIC_V1_PREFIX = "1:"; common/thrift/CatalogService.thrift:48:const string CATALOG_TOPIC_V2_PREFIX = "2:"; common/thrift/hive-1-api/TCLIService.thrift:184:const string CHARACTER_MAXIMUM_LENGTH = "characterMaximumLength" common/thrift/hive-1-api/TCLIService.thrift:187:const string PRECISION = "precision" common/thrift/hive-1-api/TCLIService.thrift:188:const string SCALE = "scale" -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 18:16:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21333 Change subject: IMPALA-13016: Fix ambiguous row_regex that check for no-existence .. IMPALA-13016: Fix ambiguous row_regex that check for no-existence There are few row_regex pattern used in EE test files that is ambiguous on whether a parttern not exist in all parts of results/runtime filter or at least one row does not have that pattern. These were catched by grepping following pattern: $ git grep -n "row_regex: (?\!" This patch replace them with either with !row_regex or VERIFY_IS_NOT_IN comment. Testing: - Run and pass modified tests. Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da --- M testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test M testdata/workloads/functional-query/queries/QueryTest/acid-truncate.test M testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-directed-mode.test 4 files changed, 7 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/21333/1 -- To view, visit http://gerrit.cloudera.org:8080/21333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da Gerrit-Change-Number: 21333 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java File fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java@28 PS4, Line 28: Currently COMPUTE STATS does not work on these tables, > That was previously prevented by having read-only access. But that's probab Done http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java File fe/src/main/java/org/apache/impala/catalog/SystemTable.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59 PS4, Line 59: TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString()); > I think it is time we should organize all impala-specific table properties Moved this property to CatalogObjects.thrift. I forgot about that as a simple place to define common values. Although maybe it'd make more sense as a 'const string' than an enum. -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 17:56:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Hello Andrew Sherman, Quanlong Huang, Riza Suminto, Jason Fehr, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21302 to look at the new patch set (#5). Change subject: IMPALA-13005: Create Query Live table in HMS .. IMPALA-13005: Create Query Live table in HMS Creates the 'sys.impala_query_live' table in HMS using a similar 'CREATE TABLE' command to 'sys.impala_query_log'. Updates frontend to identify a System Table based on the '__IMPALA_SYSTEM_TABLE' property. Tables improperly marked with '__IMPALA_SYSTEM_TABLE' will error when attempting to scan them because no relevant scanner will be available. Creating the table in HMS simplifies supporting 'SHOW CREATE TABLE' and 'DESCRIBE EXTENDED', so allows them for parity with Query Log. Explicitly disables 'COMPUTE STATS' on system tables as it doesn't work correctly. Updates workload management implementation to rely more on SystemTables.thrift definition, and adds DCHECKs to verify completeness and ordering. Testing: - adds additional test cases for changes to introspection commands - passes existing test_query_live and test_query_log suites Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/exec/system-table-scanner.cc M be/src/service/workload-management-fields.cc M be/src/service/workload-management.cc M be/src/service/workload-management.h M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java A fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/SystemTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/test/java/org/apache/impala/catalog/SystemTableTest.java M tests/custom_cluster/test_query_live.py 17 files changed, 247 insertions(+), 229 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/21302/5 -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc@300 PS4, Line 300: field.db_column > This need lowercase as well? Ack http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java File fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java@28 PS4, Line 28: Currently COMPUTE STATS does not work on these tables. > Question: is UPDATE/DELETE/TRUNCATE allowed for SystemTable? That was previously prevented by having read-only access. But that's probably no longer true, so I need to look into preventing those. -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 17:54:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21332 ) Change subject: IMPALA-13014: Upgrade Maven to 3.9.6 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15948/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c Gerrit-Change-Number: 21332 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 18 Apr 2024 17:51:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21332 ) Change subject: IMPALA-13014: Upgrade Maven to 3.9.6 .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c Gerrit-Change-Number: 21332 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 18 Apr 2024 17:30:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21332 ) Change subject: IMPALA-13014: Upgrade Maven to 3.9.6 .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_build.sh File bin/bootstrap_build.sh: http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_build.sh@53 PS1, Line 53: https://archive.apache.org/dist/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_build.sh@54 PS1, Line 54: sha512sum -c - <<< '706f01b20dec0305a822ab614d51f32b07ee11d0218175e55450242e49d2156386483b506b3a4e8a03ac8611bae96395fd5eec15f50d3013d5deed6d1ee18224 apache-maven-3.9.6-bin.tar.gz' line too long (182 > 90) http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_system.sh File bin/bootstrap_system.sh: http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_system.sh@346 PS1, Line 346: https://archive.apache.org/dist/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_system.sh@347 PS1, Line 347: sha512sum -c - <<< '706f01b20dec0305a822ab614d51f32b07ee11d0218175e55450242e49d2156386483b506b3a4e8a03ac8611bae96395fd5eec15f50d3013d5deed6d1ee18224 apache-maven-3.9.6-bin.tar.gz' line too long (182 > 90) -- To view, visit http://gerrit.cloudera.org:8080/21332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c Gerrit-Change-Number: 21332 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Apr 2024 17:28:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6
Laszlo Gaal has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21332 Change subject: IMPALA-13014: Upgrade Maven to 3.9.6 .. IMPALA-13014: Upgrade Maven to 3.9.6 IMPALA-12212 upgraded Maven to 3.9.2 to gain access to the parallel dependency resolver in the 3.9.x line. The Maven project has published several new releases since 3.9.2, fixing various issues with the new resolver, and also fixing problems with concurrent access to the local Maven cache. Pick up the latest version to gain access to these new fixes. Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c --- M bin/bootstrap_build.sh M bin/bootstrap_system.sh 2 files changed, 11 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/21332/1 -- To view, visit http://gerrit.cloudera.org:8080/21332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c Gerrit-Change-Number: 21332 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15947/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 17:05:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15946/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 17:05:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/21277/10/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/21277/10/be/src/util/backend-gflag-util.cc@266 PS10, Line 266: 1.5 > I think we should use a default value of 1.5 here. Using 2.0 (the actual sq Done http://gerrit.cloudera.org:8080/#/c/21277/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/21277/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2322 PS10, Line 2322: scaledCoreReqToCompare = Math.max(scaledCpuAskBounded, scaledCpuAskUnbounded); > I'm a little concerned that using the max here will allow the EG size to ex Yes, this will require bigger change along adjustToMaxParallelism() and traverseEffectiveParallelism(). cpuAskUnbounded here is the greedy number to encourage EG promotion. On the other hand, cpuAskBounded is the hard requirement that Frontend should adhere to, because it is what will actually run. We should think about how to do sublinear scaling of cpuAskBounded during planning in separate patch. http://gerrit.cloudera.org:8080/#/c/21277/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2442 PS10, Line 2442: nthRootSmallestEGTotalCo > Maybe call this nthrootSmallestEGTotalCores Done -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 18 Apr 2024 16:45:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 16: Code-Review+2 Carrying prior +2. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:44:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 19: Code-Review+2 Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 19 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:43:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10559/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:46:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:46:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21277 to look at the new patch set (#11). Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. IMPALA-12988: Calculate an unbounded version of CpuAsk Planner calculates CpuAsk through a recursive call beginning at Planner.computeBlockingAwareCores(), which is called after Planner.computeEffectiveParallelism(). It does blocking operator analysis over the selected degree of parallelism that was decided during computeEffectiveParallelism() traversal. That selected degree of parallelism, however, is already bounded by min and max parallelism config, derived from PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. This patch calculates an unbounded version of CpuAsk that is not bounded by min and max parallelism config. It is purely based on the fragment's ProcessingCost and query plan relationship constraint (for example, the number of JOIN BUILDER fragments should equal the number of destination JOIN fragments for partitioned join). Frontend will receive both bounded and unbounded CpuAsk values from TQueryExecRequest on each executor group set selection round. The unbounded CpuAsk is then scaled down once using a square-root-based sublinear-function, controlled by the total cpu count of the smallest executor group set and the bounded CpuAsk number. Another linear scaling is then applied on both bounded and unbounded CpuAsk using QUERY_CPU_COUNT_DIVISOR option. Frontend then picks the maximum between bounded CpuAsk and unbounded CpuAsk numbers to avoid assigning a query to a small executor group set too soon. The last executor group set stays as the "catch-all" executor group set. After this patch, the "max-parallelism" field in the query plan will all be set with maximum parallelism based on ProcessingCost. The CpuAsk counter is changed to show the unbounded CpuAsk after scaling. A new counter CpuAskBounded shows the bounded CpuAsk after scaling. The EffectiveParallelism counter remains unchanged, showing bounded CpuAsk before scaling. Testing: - Update and pass FE test TpcdsCpuCostPlannerTest and PlannerTest#testProcessingCost. - Pass EE test tests/query_test/test_tpcds_queries.py - Pass custom cluster test tests/custom_cluster/test_executor_groups.py Change-Id: I5441e31088f90761062af35862be4ce09d116923 --- M be/src/scheduling/scheduler.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/21257/15/fe/src/main/java/org/apache/impala/planner/CoreCount.java File fe/src/main/java/org/apache/impala/planner/CoreCount.java: http://gerrit.cloudera.org:8080/#/c/21257/15/fe/src/main/java/org/apache/impala/planner/CoreCount.java@132 PS15, Line 132: protected static CoreCount sum(CoreCount core1, CoreCount core2) { > nit: could be implemented in terms of Done. Thanks! http://gerrit.cloudera.org:8080/#/c/21257/15/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/21257/15/tests/custom_cluster/test_executor_groups.py@880 PS15, Line 880: # Add an exec group with 4 admission slots and 1 executors. > This comment looks like it needs to be updated. Done http://gerrit.cloudera.org:8080/#/c/21257/15/tests/custom_cluster/test_executor_groups.py@886 PS15, Line 886: # Add another exec group with 64 admission slots and 3 executors. > This comment looks like it needs to be updated. Done -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:42:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Wenzhe Zhou, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21257 to look at the new patch set (#16). Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. IMPALA-12980: Translate CpuAsk into admission control slots Impala has a concept of "admission control slots" - the amount of parallelism that should be allowed on an Impala daemon. This defaults to the number of processors per executor and can be overridden with -–admission_control_slots flag. Admission control slot accounting is described in IMPALA-8998. It computes 'slots_to_use' for each backend based on the maximum number of instances of any fragment on that backend. This can lead to slot underestimation and query overadmission. For example, assume an executor node with 48 CPU cores and configured with -–admission_control_slots=48. It is assigned 4 non-blocking query fragments, each has 12 instances scheduled in this executor. IMPALA-8998 algorithm will request the max instance (12) slots rather than the sum of all non-blocking fragment instances (48). With the 36 remaining slots free, the executor can still admit another fragment from a different query but will potentially have CPU contention with the one that is currently running. When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk number that represents the cpu requirement of that query over a particular executor group set. This number is an estimation of the largest number of query fragment instances that can run in parallel without waiting, given by the blocking operator analysis. Therefore, the fragment trace that sums into that CpuAsk number can be translated into 'slots_to_use' as well, which will be a closer resemblance of maximum parallel execution of fragment instances. This patch adds a new query option called SLOT_COUNT_STRATEGY to control which admission control slot accounting to use. There are two possible values: - LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998. This is still the default value for the SLOT_COUNT_STRATEGY option. - PLANNER_CPU_ASK, which will follow the fragment trace that contributes towards CpuAsk number. This strategy will schedule more or equal admission control slots than the LARGEST_FRAGMENT strategy. To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that contribute to CpuAsk as dominant fragments. It also passes max_slot_per_executor information that it knows about the executor group set to the scheduler. AvgAdmissionSlotsPerExecutor counter is added to describe what Planner thinks the average 'slots_to_use' per backend will be, which follows this formula: AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors) Actual 'slots_to_use' in each backend may differ than AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under each executor profile node. Testing: - Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion. - Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost. - Add EE test test_processing_cost.py. - Add FE test PlannerTest#testProcessingCostPlanAdmissionSlots. Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 --- M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/CoreCount.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test A testdata/workloads/functional-query/queries/QueryTest/processing-cost-admission-slots.test M tests/custom_cluster/test_executor_groups.py A tests/query_test/test_processing_cost.py M tests/query_test/test_tpcds_queries.py 21 files changed, 1,505 insertions(+), 111 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21257/16 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 16
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 18: Code-Review+2 Patches below this will change a bit. Will rebase and carry Code-Review votes. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 18 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:15:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 ) Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator .. Patch Set 18: Code-Review+1 (7 comments) http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@745 PS17, Line 745: int numAggExprs = getMaterializedAggregateExprs().size(); > AFAICT getMaterializedAggregateExprs().size() should return the count of th Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@760 PS17, Line 760: LOG.trace("Total CPU cost estimate: " + totalCost > Understood. Do we have any strong conventions or standards here. Just looki I don't think there is a consensus in Impala. I think the SLF4J community would recommend using parameterized messages, they're probably slightly more optimal about string building. But this is fine. http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@638 PS17, Line 638: double lhsNetworkCost = (lhsHasCompatPartition) ? 0.0 : > Restored the original formatting. Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@353 PS17, Line 353: // TODO: For broadcast join builds we're underestimating cost here because we're using > I'll enter a ticket for that. It's not a big effort but also not trivial. I Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@209 PS17, Line 209: // Coefficients for estimating scan CPU processing cost. Derived from benchmarking. > Done Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@648 PS17, Line 648: // TODO: Should we use AggregationNode.DEFAULT_SKEW_FACTOR when calculating > I think average case behavior is probably more appropriate for most cases o Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@656 PS17, Line 656: exprGlobalNdv = inputCardinality; > Agree. I've removed this TODO. Ack -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 18 Gerrit-Owner: David Rorke Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 16:04:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21302 ) Change subject: IMPALA-13005: Create Query Live table in HMS .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc@300 PS4, Line 300: field.db_column This need lowercase as well? http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java File fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java@28 PS4, Line 28: Currently COMPUTE STATS does not work on these tables. Question: is UPDATE/DELETE/TRUNCATE allowed for SystemTable? http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java File fe/src/main/java/org/apache/impala/catalog/SystemTable.java: http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59 PS4, Line 59: public static final String TBL_PROP_SYSTEM_TABLE = "__IMPALA_SYSTEM_TABLE"; I think it is time we should organize all impala-specific table properties into one place, say, as a list of string constant in CatalogObjects.thrift. Is it the first time we have table property key referred both in FE and BE code? Currently, they are scattered around FE source code like FeTable.java and others: $ git grep -n "static.* TBL_PROP_" | cat fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java:42: public static final String TBL_PROP_SORT_COLUMNS = "sort.columns"; fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java:43: public static final String TBL_PROP_SORT_ORDER = "sort.order"; fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:64: public static final String TBL_PROP_DATA_SRC_NAME = "__IMPALA_DATA_SOURCE_NAME"; fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:69: public static final String TBL_PROP_INIT_STRING = "__IMPALA_DATA_SOURCE_INIT_STRING"; fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:74: public static final String TBL_PROP_LOCATION = "__IMPALA_DATA_SOURCE_LOCATION"; fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:79: public static final String TBL_PROP_CLASS = "__IMPALA_DATA_SOURCE_CLASS"; fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:84: public static final String TBL_PROP_API_VER = "__IMPALA_DATA_SOURCE_API_VERSION"; fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:381:public static final String TBL_PROP_SKIP_HEADER_LINE_COUNT = "skip.header.line.count"; fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:168: public static final String TBL_PROP_ENABLE_STATS_EXTRAPOLATION = fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:175: public static final String TBL_PROP_DISABLE_RECURSIVE_LISTING = fe/src/main/java/org/apache/impala/catalog/Table.java:188: public static final String TBL_PROP_LAST_DDL_TIME = "transient_lastDdlTime"; fe/src/main/java/org/apache/impala/catalog/Table.java:191: public static final String TBL_PROP_LAST_COMPUTE_STATS_TIME = fe/src/main/java/org/apache/impala/catalog/Table.java:195: public static final String TBL_PROP_EXTERNAL_TABLE = "EXTERNAL"; fe/src/main/java/org/apache/impala/catalog/Table.java:198: public static final String TBL_PROP_EXTERNAL_TABLE_PURGE = "external.table.purge"; fe/src/main/java/org/apache/impala/catalog/Table.java:199: public static final String TBL_PROP_EXTERNAL_TABLE_PURGE_DEFAULT = "TRUE"; Going forward, I wish we can have standard prefix for impala-specific table property key, either "impala.*" or "__IMPALA_*". I wonder what is Quanlong and Wenzhe's opinion on this. -- To view, visit http://gerrit.cloudera.org:8080/21302 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4 Gerrit-Change-Number: 21302 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 18 Apr 2024 15:53:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12950:Improve error message in case of out-of-range numeric conversions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21331 ) Change subject: IMPALA-12950:Improve error message in case of out-of-range numeric conversions .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15945/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 Gerrit-Change-Number: 21331 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Apr 2024 15:32:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 24: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15944/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 24 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 18 Apr 2024 15:11:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12950:Improve error message in case of out-of-range numeric conversions
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21331 Change subject: IMPALA-12950:Improve error message in case of out-of-range numeric conversions .. IMPALA-12950:Improve error message in case of out-of-range numeric conversions IMPALA-12035 introduced checks for numeric conversions that are unsafe and can fail (if the target type cannot store the value, the behaviour is undefined): - from floating point types to integer types - from double to float However, it can be difficult to trace which part of the query caused this based on the error message. This change adds the source type, the destination type and the value to be converted to the error message. Unfortunately, at this point in the BE, the original SQL is not available, so we cannot reference that. Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 --- M be/src/exprs/cast-functions-ir.cc M be/src/udf/udf.h 2 files changed, 35 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21331/2 -- To view, visit http://gerrit.cloudera.org:8080/21331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32 Gerrit-Change-Number: 21331 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 23: (3 comments) http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java: http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50 PS21, Line 50: // TODO: IMPALA-13011: Awkward call for authorization here. Authorization : // will be done at validation time, but this is needed here for > Can you mention in the commit message that authorization is missing at this Done http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java: http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35 PS20, Line 35: ImpalaTypeSystemImpl > Yeah, it is perfectly fine to just add a class comment and mention that thi Ok, added a class comment. http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test File testdata/workloads/functional-query/queries/QueryTest/calcite.test: http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test@113 PS23, Line 113: xedzt > hmm, why are these different than https://github.com/apache/impala/blob/541 Yeah, prolly best to take this out. The test in binary-type does a casting function which isn't supported in this commit (but coming soon). -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 23 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 18 Apr 2024 14:48:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 24: (1 comment) http://gerrit.cloudera.org:8080/#/c/21109/24/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java: http://gerrit.cloudera.org:8080/#/c/21109/24/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26 PS24, Line 26: * https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html line too long (98 > 90) -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 24 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 18 Apr 2024 14:48:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21109 to look at the new patch set (#24). Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. IMPALA-12872: Use Calcite for optimization - part 1: simple queries This is the first commit to use the Calcite library to parse, analyze, and optimize queries. The hook for the planner is through an override of the JniFrontend. The CalciteJniFrontend class is the driver that walks through each of the Calcite steps which are as follows: CalciteQueryParser: Takes the string query and outputs an AST in the form of Calcite's SqlNode object. CalciteMetadataHandler: Iterate through the SqlNode from the previous step and make sure all essential table metadata is retrieved from catalogd. CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer. CalciteRelNodeConverter: Change the AST into a logical plan. In this first commit, the only logical nodes used are LogicalTableScan and LogicalProject. The LogicalTableScan will serve as the node that reads from an Hdfs Table and the LogicalProject will only project out the used columns in the query. In later versions, the LogicalProject will also handle function changes. CalciteOptimizer: This step is to optimize the query. In this cut, it will be a nop, but in later versions, it will perform logical optimizations via Calcite's rule mechanism. CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into Impala's PlanNode physical tree ExecRequestCreator: Implement the existing Impala steps that turn a Single Node Plan into a Distributed Plan. It will also create the TExecRequest object needed by the runtime server. Only some very basic queries will work with this commit. These include: select * from tbl <-- only needs the LogicalTableScan select c1 from tbl <-- Also uses the LogicalProject In the CalciteJniFrontend, there is some basic checks to make sure only select statements will get processed. Any non-query statement will revert back to the current Impala planner. In this iteration, any queries besides the minimal ones listed above will result in a caught exception which will then be run through the current Impala planner. The tests that do work can be found in calcite.test and run through the custom cluster test test_experimental_planner.py This iteration should support all types with the exception of complex types. Calcite does not have a STRING type, so the string type is represented as VARCHAR(MAXINT) similar to how Hive represents their STRING type. The ImpalaTypeConverter file is used to convert the Impala Type object to corresponding Calcite objects. Authorization is not yet working with this current commit. A Jira has been filed (IMPALA-13011) to deal with this. Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 --- M bin/set-classpath.sh M bin/start-impala-cluster.py M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A java/calcite-planner/pom.xml A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java A
[Impala-ASF-CR] IMPALA-12977: add search and pagination to /hadoop-varz
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21329 ) Change subject: IMPALA-12977: add search and pagination to /hadoop-varz .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15943/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0 Gerrit-Change-Number: 21329 Gerrit-PatchSet: 1 Gerrit-Owner: Saurabh Katiyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Apr 2024 14:30:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15942/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 14:03:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12977: add search and pagination to /hadoop-varz
Saurabh Katiyal has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21329 Change subject: IMPALA-12977: add search and pagination to /hadoop-varz .. IMPALA-12977: add search and pagination to /hadoop-varz Added search and pagination feature to /hadoop-varz Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0 --- M www/hadoop-varz.tmpl 1 file changed, 25 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21329/1 -- To view, visit http://gerrit.cloudera.org:8080/21329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0 Gerrit-Change-Number: 21329 Gerrit-PatchSet: 1 Gerrit-Owner: Saurabh Katiyal
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:53:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10558/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:54:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:54:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: plac > 'Move' is good from the context of the change, but if someone is reading th Done http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: > Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" wou Done http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@178 PS2, Line 178: std::string response = AiGenerateTextParseOpenAiResponse( > The other alternative would've been to create rapid::json Document and pass Agree. Preferring string for now -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:38:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest The issue is that the code previously used a std::string_view to hold the data which is actually returned by rapidjson::Document. However, the rapidjson::Document object gets destroyed after creating the std::string_view. This meant the std::string_view referenced memory that was no longer valid, leading to a heap-use-after-free error. This patch fixes this issue by modifying the function to return a std::string instead of a std::string_view. When the function returns a string, it creates a copy of the data from rapidjson::Document. This ensures the returned string has its own memory allocation and doesn't rely on the destroyed rapidjson::Document. Tests: Reran the asan build and passed. Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 --- M be/src/exprs/ai-functions-ir.cc M be/src/exprs/ai-functions.h M be/src/exprs/ai-functions.inline.h M be/src/exprs/expr-test.cc 4 files changed, 11 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/21315/3 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/15941/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Thu, 18 Apr 2024 13:32:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10557/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 13:26:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556 PS1, Line 556: able_na > [] is quite standard notation, and we are using it extensively in the Impal I'm also okay with leaving [db_name]. I think a separate top-level page or even just a paragraph showing the proper syntax would be even better. http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@566 PS2, Line 566: using If you want to make it even clearer that all files are rewritten (not just the ones with the latest schema), you could write "rewrite all files, converting them (if necessary) to the latest table schema". I'm not sure it's needed, I'm also okay with the current wording. -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 13:20:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21186 ) Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. Patch Set 16: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10556/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 16 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 18 Apr 2024 13:26:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10555/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:22:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. Patch Set 4: Irrelevant Iceberg issue: IMPALA-12621 -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:21:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list Binary fields in complex types are currently not supported at all for regular tables (an error is returned). For Iceberg metadata tables, IMPALA-12899 added a temporary workaround to allow queries that contain these fields to succeed by NULLing them out. This change adds support for displaying them with base64 encoding for both regular and Iceberg metadata tables. Complex types are displayed in JSON format, so simply inserting the bytes of the binary fields is not acceptable as it would produce invalid JSON. Base64 is a widely used encoding that allows representing arbitrary binary information using only a limited set of ASCII characters. This change also adds support for top level binary columns in Iceberg metadata tables. However, these are not base64 encoded but are returned in raw byte format - this is consistent with how top level binary columns from regular (non-metadata) tables are handled. Testing: - added test queries in iceberg-metadata-tables.test referencing both nested and top level binary fields; also updated existing queries - moved relevant tests (queries extracting binary fields from within complex types) from nested-types-scanner-basic.test to a new binary-in-complex-type.test file and also added a query that selects the containing complex types; this new test file is run from test_scanners.py::TestBinaryInComplexType::\ test_binary_in_complex_type - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to positive tests (expecting success); a negative test already in AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/rpc/jni-thrift-util.h M be/src/runtime/complex-value-writer.inline.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test M tests/query_test/test_scanners.py 26 files changed, 439 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/7 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/util/jni-util.h File be/src/util/jni-util.h: http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/util/jni-util.h@115 PS7, Line 115: /// is more restricted, see https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical line too long (162 > 90) -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Thu, 18 Apr 2024 13:08:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556 PS1, Line 556: able_na > No need to use fully qualified table names. I only included the database in [] is quite standard notation, and we are using it extensively in the Impala docs, e.g.: https://impala.apache.org/docs/build/html/topics/impala_create_table.html So users shouldn't be confused by it. This file mostly contains simple examples because the other statements have their own detailed doc page. But we don't have that for OPTIMIZE, so having a proper syntax definition here makes sense to me. Alternatively, you we could create a separate top-level page for OPTIMIZE, and here only add a few examples. http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@561 PS2, Line 561: rewrites the entire table I think we should mention that it only applies to the current implementation, so users won't have this assumption in future releases. http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@587 PS2, Line 587: rewrites the entire table Maybe also mention here that this behavior is temporary. -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 12:58:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 2: Code-Review+1 (2 comments) Thanks, just some nits. http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: move 'Move' is good from the context of the change, but if someone is reading the new code it's a bit strange. I think "place" or "put" would be better. http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: loop Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" would be better. -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 12:54:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21186 ) Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15940/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 16 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 18 Apr 2024 12:12:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types
Hello k.venureddy2...@gmail.com, Sai Hemanth Gantasala, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21186 to look at the new patch set (#16). Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted types .. IMPALA-12933: Avoid fetching unneccessary events of unwanted types There are several places where catalogd will fetch all events of a specific type on a table. E.g. in TableLoader#load(), if the table has an old createEventId, catalogd will fetch all CREATE_TABLE events after that createEventId on the table. Fetching the list of events is expensive since the filtering is done on client side, i.e. catalogd fetches all events and filter them locally based on the event type and table name. This could take hours if there are lots of events (e.g 1M) in HMS. This patch sets the eventTypeSkipList with the complement set of the wanted type. So the get_next_notification RPC can filter out some events on HMS side. To avoid bringing too much computation overhead to HMS's underlying RDBMS in evaluating predicates of EVENT_TYPE != 'xxx', rare event types (e.g. DROP_ISCHEMA) are not added in the list. A new flag, common_hms_event_types, is added to specify the common HMS event types. Once HIVE-28146 is resolved, we can set the wanted types directly in the HMS RPC and this approach can be simplified. UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT are the most common unused events for Impala. They are also added to the default skip list. A new flag, default_skipped_hms_event_types, is added to configure this list. This patch also fixes an issue that events of the non-default catalog are not filtered out. In a local perf test, I generated 100K RELOAD events after creating a table in Hive. Then use the table in Impala to trigger metadata loading on it which will fetch the latest CREATE_TABLE event by polling all events after the last known CREATE_TABLE event. Before this patch, fetching the events takes 1s779ms. Now it takes only 395.377ms. Note that in prod env, the event messages are usually larger, we could have a larger speedup. Tests: - Added an FE test - Ran CORE tests Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 15 files changed, 326 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/21186/16 -- To view, visit http://gerrit.cloudera.org:8080/21186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9 Gerrit-Change-Number: 21186 Gerrit-PatchSet: 16 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15939/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 11:13:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#3). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 151 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/3 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: (1 comment) Thanks for the quick review! http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1075 PS2, Line 1075: if (addedPartNames.contains(part.partition_name)) continue; > What does this case means? The partition was dropped, but was readded later Yeah, if a partition is dropped and then re-added, the droppedPartitions will have the old instance and the partitionMap will have the new instance. When the table is dropped/invalidated, partitions from the partitionMap are collected in the for-loop at L1057. Some of them could have the same partition name as those in the dropped_partitions. Renamed 'addedPartNames' to 'collectedPartNames' to avoid confusion. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 10:49:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1075 PS2, Line 1075: if (addedPartNames.contains(part.partition_name)) continue; What does this case means? The partition was dropped, but was readded later? -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 10:17:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 2: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/761/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 09:53:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15938/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 09:52:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py File tests/custom_cluster/test_partition.py: http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@93 PS1, Line 93: T > flake8: F821 undefined name 'TestPartitionMetadata' Done http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@98 PS1, Line 98: > flake8: W504 line break after binary operator Done -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 09:48:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Noemi Pap-Takacs has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. IMPALA-13000: Document OPTIMIZE TABLE Document OPTIMIZE TABLE syntax and behaviour. Testing: - built docs locally Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 --- M docs/topics/impala_iceberg.xml 1 file changed, 44 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/21320/2 -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21320 ) Change subject: IMPALA-13000: Document OPTIMIZE TABLE .. Patch Set 2: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/761/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/21320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Gerrit-Change-Number: 21320 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Apr 2024 09:48:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#2). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 148 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/2 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10554/ -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 09:32:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15937/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 09:25:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21326 Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 148 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/1 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py File tests/custom_cluster/test_partition.py: http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@93 PS1, Line 93: T flake8: F821 undefined name 'TestPartitionMetadata' http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@98 PS1, Line 98: a flake8: W504 line break after binary operator -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Apr 2024 09:01:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 23: (3 comments) http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java: http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50 PS21, Line 50: // TODO: IMPALA-13011: Awkward call for authorization here. Authorization : // will be done at validation time, but this is needed here for > Yeah, authorization will happen earlier. It's not implemented yet. This p Can you mention in the commit message that authorization is missing at this point? http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java: http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35 PS20, Line 35: ImpalaTypeSystemImpl > Sigh, you caught me on something I haven't researched that much... Yeah, it is perfectly fine to just add a class comment and mention that this may change in the future. It doesn't seem useful to put more effort into it while expressions/more complex queries are not supported. If there is some Hive code that acted as the inspiration, than a link to it would be nice. http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test File testdata/workloads/functional-query/queries/QueryTest/calcite.test: http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test@113 PS23, Line 113: xedzt hmm, why are these different than https://github.com/apache/impala/blob/541fc5ee9ec2d804f2ba45feb2df5bb96a013f86/testdata/workloads/functional-query/queries/QueryTest/binary-type.test#L12 ? I quickly tested it and it doesn't seem to pass with this escaped string. Note that I wouldn't mind using only the ascii lines in the test - the goal is to test the planner, not the executor + client. -- To view, visit http://gerrit.cloudera.org:8080/21109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Gerrit-Change-Number: 21109 Gerrit-PatchSet: 23 Gerrit-Owner: Steve Carlin Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Steve Carlin Gerrit-Comment-Date: Thu, 18 Apr 2024 06:55:14 + Gerrit-HasComments: Yes