[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence

2024-04-18 Thread Riza Suminto (Code Review)
Hello Gabor Kaszab, Zoltan Borok-Nagy, Wenzhe Zhou, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21333

to look at the new patch set (#2).

Change subject: IMPALA-13016: Fix ambiguous row_regex that check for 
no-existence
..

IMPALA-13016: Fix ambiguous row_regex that check for no-existence

There are few row_regex patterns used in EE test files that are
ambiguous on whether a pattern does not exist in all parts of the
results/runtime profile or at least one row does not have that pattern.
These were caught by grepping the following pattern:

$ git grep -n "row_regex: (?\!"

This patch replaces them with either with !row_regex or VERIFY_IS_NOT_IN
comment.

Testing:
- Run and pass modified tests.

Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
---
M 
testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test
M testdata/workloads/functional-query/queries/QueryTest/acid-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-directed-mode.test
4 files changed, 7 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/21333/2
--
To view, visit http://gerrit.cloudera.org:8080/21333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
Gerrit-Change-Number: 21333
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence

2024-04-18 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21333 )

Change subject: IMPALA-13016: Fix ambiguous row_regex that check for 
no-existence
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
Gerrit-Change-Number: 21333
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 19 Apr 2024 00:23:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21186 )

Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted 
types
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15957/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
Gerrit-Change-Number: 21186
Gerrit-PatchSet: 17
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 19 Apr 2024 00:06:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types

2024-04-18 Thread Quanlong Huang (Code Review)
Hello k.venureddy2...@gmail.com, Sai Hemanth Gantasala, Csaba Ringhofer, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21186

to look at the new patch set (#17).

Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted 
types
..

IMPALA-12933: Avoid fetching unneccessary events of unwanted types

There are several places where catalogd will fetch all events of a
specific type on a table. E.g. in TableLoader#load(), if the table has
an old createEventId, catalogd will fetch all CREATE_TABLE events after
that createEventId on the table.

Fetching the list of events is expensive since the filtering is done on
client side, i.e. catalogd fetches all events and filter them locally
based on the event type and table name. This could take hours if there
are lots of events (e.g 1M) in HMS.

This patch sets the eventTypeSkipList with the complement set of the
wanted type. So the get_next_notification RPC can filter out some events
on HMS side. To avoid bringing too much computation overhead to HMS's
underlying RDBMS in evaluating predicates of EVENT_TYPE != 'xxx', rare
event types (e.g. DROP_ISCHEMA) are not added in the list. A new flag,
common_hms_event_types, is added to specify the common HMS event types.

Once HIVE-28146 is resolved, we can set the wanted types directly in the
HMS RPC and this approach can be simplified.

UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT are the most
common unused events for Impala. They are also added to the default skip
list. A new flag, default_skipped_hms_event_types, is added to configure
this list.

This patch also fixes an issue that events of the non-default catalog
are not filtered out.

In a local perf test, I generated 100K RELOAD events after creating a
table in Hive. Then use the table in Impala to trigger metadata loading
on it which will fetch the latest CREATE_TABLE event by polling all
events after the last known CREATE_TABLE event. Before this patch,
fetching the events takes 1s779ms. Now it takes only 395.377ms. Note
that in prod env, the event messages are usually larger, we could have
a larger speedup.

Tests:
 - Added an FE test
 - Ran CORE tests

Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
15 files changed, 320 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/21186/17
--
To view, visit http://gerrit.cloudera.org:8080/21186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
Gerrit-Change-Number: 21186
Gerrit-PatchSet: 17
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10557/


--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 23:27:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21304 )

Change subject: IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15955/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Gerrit-Change-Number: 21304
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:54:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15956/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:55:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15954/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 20
Gerrit-Owner: David Rorke 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:44:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15953/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:44:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21277

to look at the new patch set (#13).

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..

IMPALA-12988: Calculate an unbounded version of CpuAsk

Planner calculates CpuAsk through a recursive call beginning at
Planner.computeBlockingAwareCores(), which is called after
Planner.computeEffectiveParallelism(). It does blocking operator
analysis over the selected degree of parallelism that was decided during
computeEffectiveParallelism() traversal. That selected degree of
parallelism, however, is already bounded by min and max parallelism
config, derived from PROCESSING_COST_MIN_THREADS and
MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.

This patch calculates an unbounded version of CpuAsk that is not bounded
by min and max parallelism config. It is purely based on the fragment's
ProcessingCost and query plan relationship constraint (for example, the
number of JOIN BUILDER fragments should equal the number of destination
JOIN fragments for partitioned join).

Frontend will receive both bounded and unbounded CpuAsk values from
TQueryExecRequest on each executor group set selection round. The
unbounded CpuAsk is then scaled down once using a nth root based
sublinear-function, controlled by the total cpu count of the smallest
executor group set and the bounded CpuAsk number. Another linear scaling
is then applied on both bounded and unbounded CpuAsk using
QUERY_CPU_COUNT_DIVISOR option. Frontend then compare the unbounded
CpuAsk after scaling against CpuMax to avoid assigning a query to a
small executor group set too soon. The last executor group set stays as
the "catch-all" executor group set.

After this patch, the "max-parallelism" fields in the query plan will
all be set with maximum parallelism based on ProcessingCost. The CpuAsk
counter is changed to shows the unbounded CpuAsk after scaling. A new
counter CpuAskBounded shows the bounded CpuAsk after scaling. If
QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting strategy is
selected, this CpuAskBounded is also the minimum total admission slots
give to the query. The EffectiveParallelism counter remains unchanged,
showing bounded CpuAsk before scaling.

Testing:
- Update and pass FE test TpcdsCpuCostPlannerTest and
  PlannerTest#testProcessingCost.
- Pass EE test tests/query_test/test_tpcds_queries.py
- Pass custom cluster test tests/custom_cluster/test_executor_groups.py

Change-Id: I5441e31088f90761062af35862be4ce09d116923
---
M be/src/scheduling/scheduler.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test
M 

[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
..


Patch Set 21: Code-Review+2

All changes after patch set 19 are rebase adjustment.


--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 21
Gerrit-Owner: David Rorke 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:34:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

2024-04-18 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/21304 )

Change subject: IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
..

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation.
It fixes the race condition for the caching of SQL DataSource objects by
using new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
java.sql.Connection.close() is not effectively to remove a closed
connection from connection pool, which causes JDBC handler threads to
wait available connections from connection pool for long time. The work
around is to call BasicDataSource.invalidateConnection() API to close a
connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Notes that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in a following
   patch.

Testing:
 - Passed core-test.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
---
M be/src/service/frontend.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
M 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DataSourceObjectCache.java
M 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
M 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
M 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M testdata/bin/create-load-data.sh
A testdata/bin/create-tpc-jdbc-tables.py
A testdata/datasets/tpcds/tpcds_jdbc_schema_template.sql
A testdata/datasets/tpch/tpch_jdbc_schema_template.sql
M tests/query_test/test_tpcds_queries.py
M tests/query_test/test_tpch_queries.py
16 files changed, 1,788 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21304/4
--
To view, visit http://gerrit.cloudera.org:8080/21304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Gerrit-Change-Number: 21304
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@26
PS11, Line 26: nth root based
> Update to "nth root based" or something similar to be more accurate?
Done


http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@35
PS11, Line 35: field
> Yes, fragment level parallelism. Will change to 'fields'.
Done


http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@36
PS11, Line 36: st. The CpuAsk
 : counter is changed to shows the unbounded CpuAsk after scaling. 
A new
 : counter CpuAskBounded shows the bounded CpuAsk after scaling. If
 : QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting 
strategy is
 : selected, this
> Should pick the unbounded CpuAsk after scaling. Will fix the code and commi
Done


http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@85
PS11, Line 85: } else {
> I can leave it unassigned for this branch.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:33:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21277

to look at the new patch set (#12).

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..

IMPALA-12988: Calculate an unbounded version of CpuAsk

Planner calculates CpuAsk through a recursive call beginning at
Planner.computeBlockingAwareCores(), which is called after
Planner.computeEffectiveParallelism(). It does blocking operator
analysis over the selected degree of parallelism that was decided during
computeEffectiveParallelism() traversal. That selected degree of
parallelism, however, is already bounded by min and max parallelism
config, derived from PROCESSING_COST_MIN_THREADS and
MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.

This patch calculates an unbounded version of CpuAsk that is not bounded
by min and max parallelism config. It is purely based on the fragment's
ProcessingCost and query plan relationship constraint (for example, the
number of JOIN BUILDER fragments should equal the number of destination
JOIN fragments for partitioned join).

Frontend will receive both bounded and unbounded CpuAsk values from
TQueryExecRequest on each executor group set selection round. The
unbounded CpuAsk is then scaled down once using a square-root-based
sublinear-function, controlled by the total cpu count of the smallest
executor group set and the bounded CpuAsk number. Another linear scaling
is then applied on both bounded and unbounded CpuAsk using
QUERY_CPU_COUNT_DIVISOR option. Frontend then picks the maximum between
bounded CpuAsk and unbounded CpuAsk numbers to avoid assigning a query
to a small executor group set too soon. The last executor group set
stays as the "catch-all" executor group set.

After this patch, the "max-parallelism" field in the query plan will all
be set with maximum parallelism based on ProcessingCost. The CpuAsk
counter is changed to show the unbounded CpuAsk after scaling. A new
counter CpuAskBounded shows the bounded CpuAsk after scaling. The
EffectiveParallelism counter remains unchanged, showing bounded CpuAsk
before scaling.

Testing:
- Update and pass FE test TpcdsCpuCostPlannerTest and
  PlannerTest#testProcessingCost.
- Pass EE test tests/query_test/test_tpcds_queries.py
- Pass custom cluster test tests/custom_cluster/test_executor_groups.py

Change-Id: I5441e31088f90761062af35862be4ce09d116923
---
M be/src/scheduling/scheduler.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test
M 

[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded a new patch set (#20) to the change originally 
created by David Rorke. ( http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
..

IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

This patch improves the accuracy of the CPU ProcessingCost estimates for
several of the CPU intensive operators by basing the costs on benchmark
data. The general approach for a given operator was to run a set of queries
that exercised the operator under various conditions (e.g. large vs small
row sizes and row counts, varying NDV, different file formats, etc) and
capture the CPU time spent per unit of work (the unit of work might be
measured as some number of rows, some number of bytes, some number of
predicates evaluated, or some combination of these). The data was then
analyzed in an attempt to fit a simple model that would allow us to
predict CPU consumption of a given operator based on information available
at planning time.

For example, the CPU ProcessingCost for a Parquet scan is estimated as:
TotalCost = (0.0144 * BytesMaterialized) + (0.0281 * Rows * Predicate Count)

The coefficients  (0.0144 and 0.0281) are derived from benchmarking
scans under a variety of conditions. Similar cost functions and coefficients
were derived for all of the benchmarked operators. The coefficients for all
the operators are normalized such that a single unit of cost equates to
roughly 100 nanoseconds of CPU time on a r5d.4xlarge instance. So we would
predict an operator with a cost of 10,000,000 would complete in roughly one
second on a single core.

Limitations:
* Costing only addresses CPU time spent and doesn't account for any IO
  or other wait time.
* Benchmarking scenarios didn't provide comprehensive coverage of the
  full range of data types, distributions, etc. More thorough
  benchmarking could improve the costing estimates further.
* This initial patch only covers a subset of the operators, focusing
  on those that are most common and most CPU intensive. Specifically
  the following operators are covered by this patch. All others
  continue to use the previous ProcessingCost code:
  AggregationNode
  DataStreamSink (exchange sender)
  ExchangeNode
  HashJoinNode
  HdfsScanNode
  HdfsTableSink
  NestedLoopJoinNode
  SortNode
  UnionNode

Benchmark-based costing of the remaining operators will be covered by
a future patch.

Future patches will automate the collection and analysis of the benchmark
data and the computation of the cost coefficients to simplify maintenance
of the costing as performance changes over time.

Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 

[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21277/12/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/21277/12/fe/src/main/java/org/apache/impala/service/Frontend.java@2394
PS12, Line 2394: verdict + " (require=" + scaledCpuAskUnbounded + 
", max=" + availableCores + ")");
line too long (94 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 22:22:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..

IMPALA-12980: Translate CpuAsk into admission control slots

Impala has a concept of "admission control slots" - the amount of
parallelism that should be allowed on an Impala daemon. This defaults to
the number of processors per executor and can be overridden with
-–admission_control_slots flag.

Admission control slot accounting is described in IMPALA-8998. It
computes 'slots_to_use' for each backend based on the maximum number of
instances of any fragment on that backend. This can lead to slot
underestimation and query overadmission. For example, assume an executor
node with 48 CPU cores and configured with -–admission_control_slots=48.
It is assigned 4 non-blocking query fragments, each has 12 instances
scheduled in this executor. IMPALA-8998 algorithm will request the max
instance (12) slots rather than the sum of all non-blocking fragment
instances (48). With the 36 remaining slots free, the executor can still
admit another fragment from a different query but will potentially have
CPU contention with the one that is currently running.

When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk
number that represents the cpu requirement of that query over a
particular executor group set. This number is an estimation of the
largest number of query fragment instances that can run in parallel
without waiting, given by the blocking operator analysis. Therefore, the
fragment trace that sums into that CpuAsk number can be translated into
'slots_to_use' as well, which will be a closer resemblance of maximum
parallel execution of fragment instances.

This patch adds a new query option called SLOT_COUNT_STRATEGY to control
which admission control slot accounting to use. There are two possible
values:
- LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998.
  This is still the default value for the SLOT_COUNT_STRATEGY option.
- PLANNER_CPU_ASK, which will follow the fragment trace that contributes
  towards CpuAsk number. This strategy will schedule more or equal
  admission control slots than the LARGEST_FRAGMENT strategy.

To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that
contribute to CpuAsk as dominant fragments. It also passes
max_slot_per_executor information that it knows about the executor group
set to the scheduler.

AvgAdmissionSlotsPerExecutor counter is added to describe what Planner
thinks the average 'slots_to_use' per backend will be, which follows
this formula:

  AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors)

Actual 'slots_to_use' in each backend may differ than
AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that
backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under
each executor profile node.

Testing:
- Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion.
- Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost.
- Add EE test test_processing_cost.py.
- Add FE test PlannerTest#testProcessingCostPlanAdmissionSlots.

Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Reviewed-on: http://gerrit.cloudera.org:8080/21257
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/CoreCount.java
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
A 
testdata/workloads/functional-query/queries/QueryTest/processing-cost-admission-slots.test
M tests/custom_cluster/test_executor_groups.py
A tests/query_test/test_processing_cost.py
M tests/query_test/test_tpcds_queries.py
21 files changed, 1,505 insertions(+), 111 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 18
Gerrit-Owner: Riza 

[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 21:58:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15952/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 21:46:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12938: add-opens for platform.cgroupv1

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21334 )

Change subject: IMPALA-12938: add-opens for platform.cgroupv1
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15951/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82
Gerrit-Change-Number: 21334
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 21:25:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21302/2/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/21302/2/be/src/service/workload-management.cc@116
PS2, Line 116: ble by generati
> Yes, that's why I changed them. I guess I can do that instead. According to
Done


http://gerrit.cloudera.org:8080/#/c/21302/2/common/thrift/SystemTables.thrift
File common/thrift/SystemTables.thrift:

http://gerrit.cloudera.org:8080/#/c/21302/2/common/thrift/SystemTables.thrift@23
PS2, Line 23: CLUSTER_ID
: QUERY_ID
> Just going to go back to unassigned. There's a DCHECK that asserts these ar
Done


http://gerrit.cloudera.org:8080/#/c/21302/2/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/21302/2/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59
PS2, Line 59:   TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString());
> I don't think so. I was looking at DataSourceTable for this pattern.
Added to CatalogObjects.thrift.


http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59
PS4, Line 59:   TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString());
> Yeah, I think const string with TBL_PROP_ prefix is better. A property key
Done



--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 21:20:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Michael Smith (Code Review)
Hello Andrew Sherman, Quanlong Huang, Riza Suminto, Jason Fehr, Wenzhe Zhou, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21302

to look at the new patch set (#6).

Change subject: IMPALA-13005: Create Query Live table in HMS
..

IMPALA-13005: Create Query Live table in HMS

Creates the 'sys.impala_query_live' table in HMS using a similar 'CREATE
TABLE' command to 'sys.impala_query_log'. Updates frontend to identify a
System Table based on the '__IMPALA_SYSTEM_TABLE' property. Tables
improperly marked with '__IMPALA_SYSTEM_TABLE' will error when
attempting to scan them because no relevant scanner will be available.

Creating the table in HMS simplifies supporting 'SHOW CREATE TABLE' and
'DESCRIBE EXTENDED', so allows them for parity with Query Log.
Explicitly disables 'COMPUTE STATS' on system tables as it doesn't work
correctly.

Updates workload management implementation to rely more on
SystemTables.thrift definition, and adds DCHECKs to verify completeness
and ordering.

Testing:
- adds additional test cases for changes to introspection commands
- passes existing test_query_live and test_query_log suites

Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/system-table-scanner.cc
M be/src/service/workload-management-fields.cc
M be/src/service/workload-management.cc
M be/src/service/workload-management.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java
A fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/SystemTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/test/java/org/apache/impala/catalog/SystemTableTest.java
M tests/custom_cluster/test_query_live.py
17 files changed, 243 insertions(+), 228 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/21302/6
--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12938: add-opens for platform.cgroupv1

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21334


Change subject: IMPALA-12938: add-opens for platform.cgroupv1
..

IMPALA-12938: add-opens for platform.cgroupv1

Adds '--add-opens=jdk.internal.platform.cgroupv1' for Java 11 with
ehcache, covering Impala daemons and frontend tests. Fixes
InaccessibleObjectException detected by test_banned_log_messages.py.

Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82
---
M be/src/common/init.cc
M bin/run-all-tests.sh
2 files changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/21334/1
--
To view, visit http://gerrit.cloudera.org:8080/21334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I312ae987c17c6f06e1ffe15e943b1865feef6b82
Gerrit-Change-Number: 21334
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@35
PS11, Line 35: field
> plural - 'fields'? This is referring to fragment level parallelism, right?
Yes, fragment level parallelism. Will change to 'fields'.


http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@36
PS11, Line 36: The CpuAsk
 : counter is changed to show the unbounded CpuAsk after scaling. A 
new
 : counter CpuAskBounded shows the bounded CpuAsk after scaling. The
 : EffectiveParallelism counter remains unchanged, showing bounded 
CpuAsk
 : before scaling.
> This is a little confusing.
Should pick the unbounded CpuAsk after scaling. Will fix the code and commit 
message.


http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@85
PS11, Line 85:   topNode = fragment.getPlanRoot();
> topNode not really being used?
I can leave it unassigned for this branch.



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 20:25:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Abhishek Rawat (Code Review)
Abhishek Rawat has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@26
PS11, Line 26: square-root-based
Update to "nth root based" or something similar to be more accurate?


http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@35
PS11, Line 35: field
plural - 'fields'? This is referring to fragment level parallelism, right?


http://gerrit.cloudera.org:8080/#/c/21277/11//COMMIT_MSG@36
PS11, Line 36: The CpuAsk
 : counter is changed to show the unbounded CpuAsk after scaling. A 
new
 : counter CpuAskBounded shows the bounded CpuAsk after scaling. The
 : EffectiveParallelism counter remains unchanged, showing bounded 
CpuAsk
 : before scaling.
This is a little confusing.
The Fragment instance count are still based on CpuAsk - bounded or unbounded 
and before or after scaling? Trying to figure what we use for computing 
admission_slots.


http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/21277/11/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@85
PS11, Line 85:   topNode = fragment.getPlanRoot();
topNode not really being used?



-- 
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 20:12:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..

IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

The issue is that the code previously used a std::string_view to
hold the data which is actually returned by rapidjson::Document.
However, the rapidjson::Document object gets destroyed after
creating the std::string_view. This meant the std::string_view
referenced memory that was no longer valid, leading to a
heap-use-after-free error.

This patch fixes this issue by modifying the function to
return a std::string instead of a std::string_view. When the
function returns a string, it creates a copy of the
data from rapidjson::Document. This ensures the returned
string has its own memory allocation and doesn't rely on
the destroyed rapidjson::Document.

Tests:
Reran the asan build and passed.

Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Reviewed-on: http://gerrit.cloudera.org:8080/21315
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/ai-functions-ir.cc
M be/src/exprs/ai-functions.h
M be/src/exprs/ai-functions.inline.h
M be/src/exprs/expr-test.cc
4 files changed, 11 insertions(+), 9 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 18:58:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21294 )

Change subject: IMPALA-12874: Identify active and standby catalog and 
statestore in the web debug endpoint
..

IMPALA-12874: Identify active and standby catalog and statestore in the web 
debug endpoint

This patch adds support to display the HA status of catalog and
statestore on the root web page. The status will be presented
as "Catalog Status: Active" or "Statestore Status: Standby"
based on the values retrieved from the metrics
catalogd-server.active-status and statestore.active-status.

If the catalog or statestore is standalone, it will show active as
the status, which is same as the metric.

Tests:
Ran core tests.
Manually tests the web page, and verified the status display is
correct. Also checked the situation when the failover happens,
the current 'standby' status can be changed to 'active'.

Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2
Reviewed-on: http://gerrit.cloudera.org:8080/21294
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/common/daemon-env.h
M be/src/util/default-path-handlers.cc
M be/src/util/default-path-handlers.h
M www/root.tmpl
4 files changed, 69 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/21294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2
Gerrit-Change-Number: 21294
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21294 )

Change subject: IMPALA-12874: Identify active and standby catalog and 
statestore in the web debug endpoint
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2
Gerrit-Change-Number: 21294
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 18:30:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21333 )

Change subject: IMPALA-13016: Fix ambiguous row_regex that check for 
no-existence
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15950/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
Gerrit-Change-Number: 21333
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 18:29:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21186 )

Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted 
types
..


Patch Set 16: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
Gerrit-Change-Number: 21186
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 18 Apr 2024 18:25:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15949/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 18:18:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59
PS4, Line 59:   TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString());
> Moved this property to CatalogObjects.thrift. I forgot about that as a simp
Yeah, I think const string with TBL_PROP_ prefix is better. A property key may 
have dot or other char that is not valid as Thrift identifier. We have few of 
those const string already.

$ git grep -n "const string" common/thrift/
common/thrift/CatalogService.thrift:44:const string CATALOG_TOPIC_V1_PREFIX = 
"1:";
common/thrift/CatalogService.thrift:48:const string CATALOG_TOPIC_V2_PREFIX = 
"2:";
common/thrift/hive-1-api/TCLIService.thrift:184:const string 
CHARACTER_MAXIMUM_LENGTH = "characterMaximumLength"
common/thrift/hive-1-api/TCLIService.thrift:187:const string PRECISION = 
"precision"
common/thrift/hive-1-api/TCLIService.thrift:188:const string SCALE = "scale"



--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 18:16:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13016: Fix ambiguous row regex that check for no-existence

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21333


Change subject: IMPALA-13016: Fix ambiguous row_regex that check for 
no-existence
..

IMPALA-13016: Fix ambiguous row_regex that check for no-existence

There are few row_regex pattern used in EE test files that is ambiguous
on whether a parttern not exist in all parts of results/runtime filter
or at least one row does not have that pattern. These were catched by
grepping following pattern:

$ git grep -n "row_regex: (?\!"

This patch replace them with either with !row_regex or VERIFY_IS_NOT_IN
comment.

Testing:
- Run and pass modified tests.

Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
---
M 
testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test
M testdata/workloads/functional-query/queries/QueryTest/acid-truncate.test
M testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-directed-mode.test
4 files changed, 7 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/21333/1
--
To view, visit http://gerrit.cloudera.org:8080/21333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic81de34bf997dfaf1c199b1fe1b05346b55ff4da
Gerrit-Change-Number: 21333
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java
File fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java@28
PS4, Line 28: Currently COMPUTE STATS does not work on these tables,
> That was previously prevented by having read-only access. But that's probab
Done


http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59
PS4, Line 59:   TImpalaTableProperty.__IMPALA_SYSTEM_TABLE.toString());
> I think it is time we should organize all impala-specific table properties
Moved this property to CatalogObjects.thrift. I forgot about that as a simple 
place to define common values.

Although maybe it'd make more sense as a 'const string' than an enum.



--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:56:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Michael Smith (Code Review)
Hello Andrew Sherman, Quanlong Huang, Riza Suminto, Jason Fehr, Wenzhe Zhou, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21302

to look at the new patch set (#5).

Change subject: IMPALA-13005: Create Query Live table in HMS
..

IMPALA-13005: Create Query Live table in HMS

Creates the 'sys.impala_query_live' table in HMS using a similar 'CREATE
TABLE' command to 'sys.impala_query_log'. Updates frontend to identify a
System Table based on the '__IMPALA_SYSTEM_TABLE' property. Tables
improperly marked with '__IMPALA_SYSTEM_TABLE' will error when
attempting to scan them because no relevant scanner will be available.

Creating the table in HMS simplifies supporting 'SHOW CREATE TABLE' and
'DESCRIBE EXTENDED', so allows them for parity with Query Log.
Explicitly disables 'COMPUTE STATS' on system tables as it doesn't work
correctly.

Updates workload management implementation to rely more on
SystemTables.thrift definition, and adds DCHECKs to verify completeness
and ordering.

Testing:
- adds additional test cases for changes to introspection commands
- passes existing test_query_live and test_query_log suites

Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/system-table-scanner.cc
M be/src/service/workload-management-fields.cc
M be/src/service/workload-management.cc
M be/src/service/workload-management.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java
A fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/SystemTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/test/java/org/apache/impala/catalog/SystemTableTest.java
M tests/custom_cluster/test_query_live.py
17 files changed, 247 insertions(+), 229 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/21302/5
--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc@300
PS4, Line 300: field.db_column
> This need lowercase as well?
Ack


http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java
File fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java@28
PS4, Line 28: Currently COMPUTE STATS does not work on these tables.
> Question: is UPDATE/DELETE/TRUNCATE allowed for SystemTable?
That was previously prevented by having read-only access. But that's probably 
no longer true, so I need to look into preventing those.



--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:54:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21332 )

Change subject: IMPALA-13014: Upgrade Maven to 3.9.6
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15948/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
Gerrit-Change-Number: 21332
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:51:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21332 )

Change subject: IMPALA-13014: Upgrade Maven to 3.9.6
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
Gerrit-Change-Number: 21332
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:30:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21332 )

Change subject: IMPALA-13014: Upgrade Maven to 3.9.6
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_build.sh
File bin/bootstrap_build.sh:

http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_build.sh@53
PS1, Line 53: 
https://archive.apache.org/dist/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_build.sh@54
PS1, Line 54:   sha512sum -c - <<< 
'706f01b20dec0305a822ab614d51f32b07ee11d0218175e55450242e49d2156386483b506b3a4e8a03ac8611bae96395fd5eec15f50d3013d5deed6d1ee18224
  apache-maven-3.9.6-bin.tar.gz'
line too long (182 > 90)


http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_system.sh
File bin/bootstrap_system.sh:

http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_system.sh@346
PS1, Line 346: 
https://archive.apache.org/dist/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/21332/1/bin/bootstrap_system.sh@347
PS1, Line 347:   sha512sum -c - <<< 
'706f01b20dec0305a822ab614d51f32b07ee11d0218175e55450242e49d2156386483b506b3a4e8a03ac8611bae96395fd5eec15f50d3013d5deed6d1ee18224
  apache-maven-3.9.6-bin.tar.gz'
line too long (182 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
Gerrit-Change-Number: 21332
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:28:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13014: Upgrade Maven to 3.9.6

2024-04-18 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21332


Change subject: IMPALA-13014: Upgrade Maven to 3.9.6
..

IMPALA-13014: Upgrade Maven to 3.9.6

IMPALA-12212 upgraded Maven to 3.9.2 to gain access to the parallel
dependency resolver in the 3.9.x line. The Maven project has published
several new releases since 3.9.2, fixing various issues with the new
resolver, and also fixing problems with concurrent access to the
local Maven cache.

Pick up the latest version to gain access to these new fixes.

Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
---
M bin/bootstrap_build.sh
M bin/bootstrap_system.sh
2 files changed, 11 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/21332/1
--
To view, visit http://gerrit.cloudera.org:8080/21332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I726618d084f4f0737f5b876879a90c17b0c3777c
Gerrit-Change-Number: 21332
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15947/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:05:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15946/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 17:05:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21277/10/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/21277/10/be/src/util/backend-gflag-util.cc@266
PS10, Line 266: 1.5
> I think we should use a default value of 1.5 here. Using 2.0 (the actual sq
Done


http://gerrit.cloudera.org:8080/#/c/21277/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/21277/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2322
PS10, Line 2322: scaledCoreReqToCompare = Math.max(scaledCpuAskBounded, 
scaledCpuAskUnbounded);
> I'm a little concerned that using the max here will allow the EG size to ex
Yes, this will require bigger change along adjustToMaxParallelism() and 
traverseEffectiveParallelism().

cpuAskUnbounded here is the greedy number to encourage EG promotion.
On the other hand, cpuAskBounded is the hard requirement that Frontend should 
adhere to, because it is what will actually run. We should think about how to 
do sublinear scaling of cpuAskBounded during planning in separate patch.


http://gerrit.cloudera.org:8080/#/c/21277/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2442
PS10, Line 2442: nthRootSmallestEGTotalCo
> Maybe call this nthrootSmallestEGTotalCores
Done



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:45:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 16: Code-Review+2

Carrying prior +2.


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:44:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
..


Patch Set 19: Code-Review+2

Carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 19
Gerrit-Owner: David Rorke 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:43:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10559/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:46:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:46:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-18 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Abhishek Rawat, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21277

to look at the new patch set (#11).

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..

IMPALA-12988: Calculate an unbounded version of CpuAsk

Planner calculates CpuAsk through a recursive call beginning at
Planner.computeBlockingAwareCores(), which is called after
Planner.computeEffectiveParallelism(). It does blocking operator
analysis over the selected degree of parallelism that was decided during
computeEffectiveParallelism() traversal. That selected degree of
parallelism, however, is already bounded by min and max parallelism
config, derived from PROCESSING_COST_MIN_THREADS and
MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.

This patch calculates an unbounded version of CpuAsk that is not bounded
by min and max parallelism config. It is purely based on the fragment's
ProcessingCost and query plan relationship constraint (for example, the
number of JOIN BUILDER fragments should equal the number of destination
JOIN fragments for partitioned join).

Frontend will receive both bounded and unbounded CpuAsk values from
TQueryExecRequest on each executor group set selection round. The
unbounded CpuAsk is then scaled down once using a square-root-based
sublinear-function, controlled by the total cpu count of the smallest
executor group set and the bounded CpuAsk number. Another linear scaling
is then applied on both bounded and unbounded CpuAsk using
QUERY_CPU_COUNT_DIVISOR option. Frontend then picks the maximum between
bounded CpuAsk and unbounded CpuAsk numbers to avoid assigning a query
to a small executor group set too soon. The last executor group set
stays as the "catch-all" executor group set.

After this patch, the "max-parallelism" field in the query plan will all
be set with maximum parallelism based on ProcessingCost. The CpuAsk
counter is changed to show the unbounded CpuAsk after scaling. A new
counter CpuAskBounded shows the bounded CpuAsk after scaling. The
EffectiveParallelism counter remains unchanged, showing bounded CpuAsk
before scaling.

Testing:
- Update and pass FE test TpcdsCpuCostPlannerTest and
  PlannerTest#testProcessingCost.
- Pass EE test tests/query_test/test_tpcds_queries.py
- Pass custom cluster test tests/custom_cluster/test_executor_groups.py

Change-Id: I5441e31088f90761062af35862be4ce09d116923
---
M be/src/scheduling/scheduler.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test
M 

[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21257/15/fe/src/main/java/org/apache/impala/planner/CoreCount.java
File fe/src/main/java/org/apache/impala/planner/CoreCount.java:

http://gerrit.cloudera.org:8080/#/c/21257/15/fe/src/main/java/org/apache/impala/planner/CoreCount.java@132
PS15, Line 132:   protected static CoreCount sum(CoreCount core1, CoreCount 
core2) {
> nit: could be implemented in terms of
Done. Thanks!


http://gerrit.cloudera.org:8080/#/c/21257/15/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/21257/15/tests/custom_cluster/test_executor_groups.py@880
PS15, Line 880: # Add an exec group with 4 admission slots and 1 executors.
> This comment looks like it needs to be updated.
Done


http://gerrit.cloudera.org:8080/#/c/21257/15/tests/custom_cluster/test_executor_groups.py@886
PS15, Line 886: # Add another exec group with 64 admission slots and 3 
executors.
> This comment looks like it needs to be updated.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:42:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-18 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Wenzhe Zhou, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21257

to look at the new patch set (#16).

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..

IMPALA-12980: Translate CpuAsk into admission control slots

Impala has a concept of "admission control slots" - the amount of
parallelism that should be allowed on an Impala daemon. This defaults to
the number of processors per executor and can be overridden with
-–admission_control_slots flag.

Admission control slot accounting is described in IMPALA-8998. It
computes 'slots_to_use' for each backend based on the maximum number of
instances of any fragment on that backend. This can lead to slot
underestimation and query overadmission. For example, assume an executor
node with 48 CPU cores and configured with -–admission_control_slots=48.
It is assigned 4 non-blocking query fragments, each has 12 instances
scheduled in this executor. IMPALA-8998 algorithm will request the max
instance (12) slots rather than the sum of all non-blocking fragment
instances (48). With the 36 remaining slots free, the executor can still
admit another fragment from a different query but will potentially have
CPU contention with the one that is currently running.

When COMPUTE_PROCESSING_COST is enabled, Planner will generate a CpuAsk
number that represents the cpu requirement of that query over a
particular executor group set. This number is an estimation of the
largest number of query fragment instances that can run in parallel
without waiting, given by the blocking operator analysis. Therefore, the
fragment trace that sums into that CpuAsk number can be translated into
'slots_to_use' as well, which will be a closer resemblance of maximum
parallel execution of fragment instances.

This patch adds a new query option called SLOT_COUNT_STRATEGY to control
which admission control slot accounting to use. There are two possible
values:
- LARGEST_FRAGMENT, which is the original algorithm from IMPALA-8998.
  This is still the default value for the SLOT_COUNT_STRATEGY option.
- PLANNER_CPU_ASK, which will follow the fragment trace that contributes
  towards CpuAsk number. This strategy will schedule more or equal
  admission control slots than the LARGEST_FRAGMENT strategy.

To do the PLANNER_CPU_ASK strategy, the Planner will mark fragments that
contribute to CpuAsk as dominant fragments. It also passes
max_slot_per_executor information that it knows about the executor group
set to the scheduler.

AvgAdmissionSlotsPerExecutor counter is added to describe what Planner
thinks the average 'slots_to_use' per backend will be, which follows
this formula:

  AvgAdmissionSlotsPerExecutor = ceil(CpuAsk / num_executors)

Actual 'slots_to_use' in each backend may differ than
AvgAdmissionSlotsPerExecutor, depending on what is scheduled on that
backend. 'slots_to_use' will be shown as 'AdmissionSlots' counter under
each executor profile node.

Testing:
- Update test_executors.py with AvgAdmissionSlotsPerExecutor assertion.
- Pass test_tpcds_queries.py::TestTpcdsQueryWithProcessingCost.
- Add EE test test_processing_cost.py.
- Add FE test PlannerTest#testProcessingCostPlanAdmissionSlots.

Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
---
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/CoreCount.java
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
A 
testdata/workloads/functional-query/queries/QueryTest/processing-cost-admission-slots.test
M tests/custom_cluster/test_executor_groups.py
A tests/query_test/test_processing_cost.py
M tests/query_test/test_tpcds_queries.py
21 files changed, 1,505 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21257/16
--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 16

[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
..


Patch Set 18: Code-Review+2

Patches below this will change a bit.
Will rebase and carry Code-Review votes.


--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 18
Gerrit-Owner: David Rorke 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:15:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
..


Patch Set 18: Code-Review+1

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@745
PS17, Line 745: int numAggExprs = getMaterializedAggregateExprs().size();
> AFAICT getMaterializedAggregateExprs().size() should return the count of th
Ack


http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@760
PS17, Line 760:   LOG.trace("Total CPU cost estimate: " + totalCost
> Understood. Do we have any strong conventions or standards here. Just looki
I don't think there is a consensus in Impala. I think the SLF4J community would 
recommend using parameterized messages, they're probably slightly more optimal 
about string building. But this is fine.


http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@638
PS17, Line 638:   double lhsNetworkCost = (lhsHasCompatPartition) ? 0.0 :
> Restored the original formatting.
Ack


http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@353
PS17, Line 353: // TODO: For broadcast join builds we're underestimating 
cost here because we're using
> I'll enter a ticket for that. It's not a big effort but also not trivial. I
Ack


http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@209
PS17, Line 209:   // Coefficients for estimating scan CPU processing cost. 
Derived from benchmarking.
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@648
PS17, Line 648: // TODO: Should we use AggregationNode.DEFAULT_SKEW_FACTOR 
when calculating
> I think average case behavior is probably more appropriate for most cases o
Ack


http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@656
PS17, Line 656: exprGlobalNdv = inputCardinality;
> Agree. I've removed this TODO.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 18
Gerrit-Owner: David Rorke 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 16:04:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13005: Create Query Live table in HMS

2024-04-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21302 )

Change subject: IMPALA-13005: Create Query Live table in HMS
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/21302/4/be/src/service/workload-management.cc@300
PS4, Line 300: field.db_column
This need lowercase as well?


http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java
File fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/analysis/SystemTableRef.java@28
PS4, Line 28: Currently COMPUTE STATS does not work on these tables.
Question: is UPDATE/DELETE/TRUNCATE allowed for SystemTable?


http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/21302/4/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@59
PS4, Line 59: public static final String TBL_PROP_SYSTEM_TABLE = 
"__IMPALA_SYSTEM_TABLE";
I think it is time we should organize all impala-specific table properties into 
one place, say, as a list of string constant in CatalogObjects.thrift. Is it 
the first time we have table property key referred both in FE and BE code?

Currently, they are scattered around FE source code like FeTable.java and 
others:

$ git grep -n "static.* TBL_PROP_" | cat
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java:42:  
public static final String TBL_PROP_SORT_COLUMNS = "sort.columns";
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java:43:  
public static final String TBL_PROP_SORT_ORDER = "sort.order";
fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:64:  public 
static final String TBL_PROP_DATA_SRC_NAME = "__IMPALA_DATA_SOURCE_NAME";
fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:69:  public 
static final String TBL_PROP_INIT_STRING = "__IMPALA_DATA_SOURCE_INIT_STRING";
fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:74:  public 
static final String TBL_PROP_LOCATION = "__IMPALA_DATA_SOURCE_LOCATION";
fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:79:  public 
static final String TBL_PROP_CLASS = "__IMPALA_DATA_SOURCE_CLASS";
fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:84:  public 
static final String TBL_PROP_API_VER = "__IMPALA_DATA_SOURCE_API_VERSION";
fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:381:public static 
final String TBL_PROP_SKIP_HEADER_LINE_COUNT = "skip.header.line.count";
fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:168:  public static 
final String TBL_PROP_ENABLE_STATS_EXTRAPOLATION =
fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:175:  public static 
final String TBL_PROP_DISABLE_RECURSIVE_LISTING =
fe/src/main/java/org/apache/impala/catalog/Table.java:188:  public static final 
String TBL_PROP_LAST_DDL_TIME = "transient_lastDdlTime";
fe/src/main/java/org/apache/impala/catalog/Table.java:191:  public static final 
String TBL_PROP_LAST_COMPUTE_STATS_TIME =
fe/src/main/java/org/apache/impala/catalog/Table.java:195:  public static final 
String TBL_PROP_EXTERNAL_TABLE = "EXTERNAL";
fe/src/main/java/org/apache/impala/catalog/Table.java:198:  public static final 
String TBL_PROP_EXTERNAL_TABLE_PURGE = "external.table.purge";
fe/src/main/java/org/apache/impala/catalog/Table.java:199:  public static final 
String TBL_PROP_EXTERNAL_TABLE_PURGE_DEFAULT = "TRUE";

Going forward, I wish we can have standard prefix for impala-specific table 
property key, either "impala.*" or "__IMPALA_*".
I wonder what is Quanlong and Wenzhe's opinion on this.



--
To view, visit http://gerrit.cloudera.org:8080/21302
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf302ee54a819fdee2db0ae582a5eeddffe4a5b4
Gerrit-Change-Number: 21302
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 18 Apr 2024 15:53:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12950:Improve error message in case of out-of-range numeric conversions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21331 )

Change subject: IMPALA-12950:Improve error message in case of out-of-range 
numeric conversions
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15945/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
Gerrit-Change-Number: 21331
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2024 15:32:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 24:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15944/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 24
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 15:11:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12950:Improve error message in case of out-of-range numeric conversions

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21331


Change subject: IMPALA-12950:Improve error message in case of out-of-range 
numeric conversions
..

IMPALA-12950:Improve error message in case of out-of-range numeric conversions

IMPALA-12035 introduced checks for numeric conversions that are unsafe
and can fail (if the target type cannot store the value, the behaviour
is undefined):
 - from floating point types to integer types
 - from double to float

However, it can be difficult to trace which part of the query caused
this based on the error message. This change adds the source type, the
destination type and the value to be converted to the error message.
Unfortunately, at this point in the BE, the original SQL is not
available, so we cannot reference that.

Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
---
M be/src/exprs/cast-functions-ir.cc
M be/src/udf/udf.h
2 files changed, 35 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21331/2
--
To view, visit http://gerrit.cloudera.org:8080/21331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieeed52e25f155818c35c11a8a6821708476ffb32
Gerrit-Change-Number: 21331
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50
PS21, Line 50: // TODO: IMPALA-13011: Awkward call for authorization here. 
Authorization
 : // will be done at validation time, but this is needed here 
for
> Can you mention in the commit message that authorization is missing at this
Done


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35
PS20, Line 35: ImpalaTypeSystemImpl
> Yeah, it is perfectly fine to just add a class comment and mention that thi
Ok, added a class comment.


http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test@113
PS23, Line 113: xedzt
> hmm, why are these different than https://github.com/apache/impala/blob/541
Yeah, prolly best to take this out. The test in binary-type does a casting 
function which isn't supported in this commit (but coming soon).



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 23
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 14:48:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 24:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/24/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/24/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS24, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 24
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 14:48:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#24).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 

[Impala-ASF-CR] IMPALA-12977: add search and pagination to /hadoop-varz

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21329 )

Change subject: IMPALA-12977: add search and pagination to /hadoop-varz
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15943/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0
Gerrit-Change-Number: 21329
Gerrit-PatchSet: 1
Gerrit-Owner: Saurabh Katiyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2024 14:30:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15942/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 14:03:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12977: add search and pagination to /hadoop-varz

2024-04-18 Thread Saurabh Katiyal (Code Review)
Saurabh Katiyal has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21329


Change subject: IMPALA-12977: add search and pagination to /hadoop-varz
..

IMPALA-12977: add search and pagination to /hadoop-varz

Added search and pagination feature to /hadoop-varz

Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0
---
M www/hadoop-varz.tmpl
1 file changed, 25 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21329/1
--
To view, visit http://gerrit.cloudera.org:8080/21329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0
Gerrit-Change-Number: 21329
Gerrit-PatchSet: 1
Gerrit-Owner: Saurabh Katiyal 


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:53:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10558/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:54:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:54:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h
File be/src/exprs/ai-functions.inline.h:

http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108
PS2, Line 108: plac
> 'Move' is good from the context of the change, but if someone is reading th
Done


http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108
PS2, Line 108:
> Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" wou
Done


http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@178
PS2, Line 178:   std::string response = AiGenerateTextParseOpenAiResponse(
> The other alternative would've been to create rapid::json Document and pass
Agree. Preferring string for now



--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:38:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..

IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

The issue is that the code previously used a std::string_view to
hold the data which is actually returned by rapidjson::Document.
However, the rapidjson::Document object gets destroyed after
creating the std::string_view. This meant the std::string_view
referenced memory that was no longer valid, leading to a
heap-use-after-free error.

This patch fixes this issue by modifying the function to
return a std::string instead of a std::string_view. When the
function returns a string, it creates a copy of the
data from rapidjson::Document. This ensures the returned
string has its own memory allocation and doesn't rely on
the destroyed rapidjson::Document.

Tests:
Reran the asan build and passed.

Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
---
M be/src/exprs/ai-functions-ir.cc
M be/src/exprs/ai-functions.h
M be/src/exprs/ai-functions.inline.h
M be/src/exprs/expr-test.cc
4 files changed, 11 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/21315/3
--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/15941/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:32:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10557/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:26:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556
PS1, Line 556: able_na
> [] is quite standard notation, and we are using it extensively in the Impal
I'm also okay with leaving [db_name]. I think a separate top-level page or even 
just a paragraph showing the proper syntax would be even better.


http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@566
PS2, Line 566: using
If you want to make it even clearer that all files are rewritten (not just the 
ones with the latest schema), you could write "rewrite all files, converting 
them (if necessary) to the latest table schema".
I'm not sure it's needed, I'm also okay with the current wording.



--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:20:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21186 )

Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted 
types
..


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10556/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
Gerrit-Change-Number: 21186
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:26:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21294 )

Change subject: IMPALA-12874: Identify active and standby catalog and 
statestore in the web debug endpoint
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10555/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2
Gerrit-Change-Number: 21294
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:22:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint

2024-04-18 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21294 )

Change subject: IMPALA-12874: Identify active and standby catalog and 
statestore in the web debug endpoint
..


Patch Set 4:

Irrelevant Iceberg issue: IMPALA-12621


--
To view, visit http://gerrit.cloudera.org:8080/21294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2
Gerrit-Change-Number: 21294
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:21:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..

IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types 
in select list

Binary fields in complex types are currently not supported at all for
regular tables (an error is returned). For Iceberg metadata tables,
IMPALA-12899 added a temporary workaround to allow queries that contain
these fields to succeed by NULLing them out. This change adds support
for displaying them with base64 encoding for both regular and Iceberg
metadata tables.

Complex types are displayed in JSON format, so simply inserting the
bytes of the binary fields is not acceptable as it would produce invalid
JSON. Base64 is a widely used encoding that allows representing
arbitrary binary information using only a limited set of ASCII
characters.

This change also adds support for top level binary columns in Iceberg
metadata tables. However, these are not base64 encoded but are returned
in raw byte format - this is consistent with how top level binary
columns from regular (non-metadata) tables are handled.

Testing:
 - added test queries in iceberg-metadata-tables.test referencing both
   nested and top level binary fields; also updated existing queries
 - moved relevant tests (queries extracting binary fields from within
   complex types) from nested-types-scanner-basic.test to a new
   binary-in-complex-type.test file and also added a query that selects
   the containing complex types; this new test file is run from
   test_scanners.py::TestBinaryInComplexType::\
 test_binary_in_complex_type
 - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
   AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
   positive tests (expecting success); a negative test already in
   AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted

Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/rpc/jni-thrift-util.h
M be/src/runtime/complex-value-writer.inline.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/0-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
M tests/query_test/test_scanners.py
26 files changed, 439 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/21269/7
--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/util/jni-util.h@115
PS7, Line 115: /// is more restricted, see 
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical
line too long (162 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Thu, 18 Apr 2024 13:08:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/1/docs/topics/impala_iceberg.xml@556
PS1, Line 556: able_na
> No need to use fully qualified table names. I only included the database in
[] is quite standard notation, and we are using it extensively in the Impala 
docs, e.g.: 
https://impala.apache.org/docs/build/html/topics/impala_create_table.html

So users shouldn't be confused by it. This file mostly contains simple examples 
because the other statements have their own detailed doc page. But we don't 
have that for OPTIMIZE, so having a proper syntax definition here makes sense 
to me. Alternatively, you we could create a separate top-level page for 
OPTIMIZE, and here only add a few examples.


http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml
File docs/topics/impala_iceberg.xml:

http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@561
PS2, Line 561: rewrites the entire table
I think we should mention that it only applies to the current implementation, 
so users won't have this assumption in future releases.


http://gerrit.cloudera.org:8080/#/c/21320/2/docs/topics/impala_iceberg.xml@587
PS2, Line 587: rewrites the entire table
Maybe also mention here that this behavior is temporary.



--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 12:58:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest

2024-04-18 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21315 )

Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest 
AiFunctionsTest
..


Patch Set 2: Code-Review+1

(2 comments)

Thanks, just some nits.

http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h
File be/src/exprs/ai-functions.inline.h:

http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108
PS2, Line 108: move
'Move' is good from the context of the change, but if someone is reading the 
new code it's a bit strange. I think "place" or "put" would be better.


http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108
PS2, Line 108: loop
Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" would 
be better.



--
To view, visit http://gerrit.cloudera.org:8080/21315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 12:54:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21186 )

Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted 
types
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15940/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
Gerrit-Change-Number: 21186
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 18 Apr 2024 12:12:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12933: Avoid fetching unneccessary events of unwanted types

2024-04-18 Thread Quanlong Huang (Code Review)
Hello k.venureddy2...@gmail.com, Sai Hemanth Gantasala, Csaba Ringhofer, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21186

to look at the new patch set (#16).

Change subject: IMPALA-12933: Avoid fetching unneccessary events of unwanted 
types
..

IMPALA-12933: Avoid fetching unneccessary events of unwanted types

There are several places where catalogd will fetch all events of a
specific type on a table. E.g. in TableLoader#load(), if the table has
an old createEventId, catalogd will fetch all CREATE_TABLE events after
that createEventId on the table.

Fetching the list of events is expensive since the filtering is done on
client side, i.e. catalogd fetches all events and filter them locally
based on the event type and table name. This could take hours if there
are lots of events (e.g 1M) in HMS.

This patch sets the eventTypeSkipList with the complement set of the
wanted type. So the get_next_notification RPC can filter out some events
on HMS side. To avoid bringing too much computation overhead to HMS's
underlying RDBMS in evaluating predicates of EVENT_TYPE != 'xxx', rare
event types (e.g. DROP_ISCHEMA) are not added in the list. A new flag,
common_hms_event_types, is added to specify the common HMS event types.

Once HIVE-28146 is resolved, we can set the wanted types directly in the
HMS RPC and this approach can be simplified.

UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT are the most
common unused events for Impala. They are also added to the default skip
list. A new flag, default_skipped_hms_event_types, is added to configure
this list.

This patch also fixes an issue that events of the non-default catalog
are not filtered out.

In a local perf test, I generated 100K RELOAD events after creating a
table in Hive. Then use the table in Impala to trigger metadata loading
on it which will fetch the latest CREATE_TABLE event by polling all
events after the last known CREATE_TABLE event. Before this patch,
fetching the events takes 1s779ms. Now it takes only 395.377ms. Note
that in prod env, the event messages are usually larger, we could have
a larger speedup.

Tests:
 - Added an FE test
 - Ran CORE tests

Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
15 files changed, 326 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/21186/16
--
To view, visit http://gerrit.cloudera.org:8080/21186
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieabe714328aa2cc605cb62b85ae8aa4bd537dbe9
Gerrit-Change-Number: 21186
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15939/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 11:13:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Quanlong Huang (Code Review)
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21326

to look at the new patch set (#3).

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_partition.py
M tests/metadata/test_recover_partitions.py
8 files changed, 151 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/3
--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 2:

(1 comment)

Thanks for the quick review!

http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1075
PS2, Line 1075: if (addedPartNames.contains(part.partition_name)) 
continue;
> What does this case means? The partition was dropped, but was readded later
Yeah, if a partition is dropped and then re-added, the droppedPartitions will 
have the old instance and the partitionMap will have the new instance. When the 
table is dropped/invalidated, partitions from the partitionMap are collected in 
the for-loop at L1057. Some of them could have the same partition name as those 
in the dropped_partitions.

Renamed 'addedPartNames' to 'collectedPartNames' to avoid confusion.



--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 10:49:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1075
PS2, Line 1075: if (addedPartNames.contains(part.partition_name)) 
continue;
What does this case means? The partition was dropped, but was readded later?



--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 10:17:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 2: Verified+1

Build Successful

https://jenkins.impala.io/job/gerrit-docs-auto-test/761/ : Doc tests passed.


--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:53:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15938/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:52:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py
File tests/custom_cluster/test_partition.py:

http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@93
PS1, Line 93: T
> flake8: F821 undefined name 'TestPartitionMetadata'
Done


http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@98
PS1, Line 98:
> flake8: W504 line break after binary operator
Done



--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:48:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..

IMPALA-13000: Document OPTIMIZE TABLE

Document OPTIMIZE TABLE syntax and behaviour.

Testing:
 - built docs locally

Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
---
M docs/topics/impala_iceberg.xml
1 file changed, 44 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/21320/2
--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13000: Document OPTIMIZE TABLE

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21320 )

Change subject: IMPALA-13000: Document OPTIMIZE TABLE
..


Patch Set 2:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/761/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/21320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Gerrit-Change-Number: 21320
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:48:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Quanlong Huang (Code Review)
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21326

to look at the new patch set (#2).

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_partition.py
M tests/metadata/test_recover_partitions.py
8 files changed, 148 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/2
--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21294 )

Change subject: IMPALA-12874: Identify active and standby catalog and 
statestore in the web debug endpoint
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10554/


--
To view, visit http://gerrit.cloudera.org:8080/21294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2
Gerrit-Change-Number: 21294
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:32:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15937/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:25:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21326


Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_partition.py
M tests/metadata/test_recover_partitions.py
8 files changed, 148 insertions(+), 19 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/1
--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py
File tests/custom_cluster/test_partition.py:

http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@93
PS1, Line 93: T
flake8: F821 undefined name 'TestPartitionMetadata'


http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@98
PS1, Line 98: a
flake8: W504 line break after binary operator



--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2024 09:01:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50
PS21, Line 50: // TODO: IMPALA-13011: Awkward call for authorization here. 
Authorization
 : // will be done at validation time, but this is needed here 
for
> Yeah, authorization will happen earlier.  It's not implemented yet.  This p
Can you mention in the commit message that authorization is missing at this 
point?


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35
PS20, Line 35: ImpalaTypeSystemImpl
> Sigh, you caught me on something I haven't researched that much...
Yeah, it is perfectly fine to just add a class comment and mention that this 
may change in the future. It doesn't seem useful to put more effort into it 
while expressions/more complex queries are not supported. If there is some Hive 
code that acted as the inspiration, than a link to it would be nice.


http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test@113
PS23, Line 113: xedzt
hmm, why are these different than 
https://github.com/apache/impala/blob/541fc5ee9ec2d804f2ba45feb2df5bb96a013f86/testdata/workloads/functional-query/queries/QueryTest/binary-type.test#L12
 ?
I quickly tested it and it doesn't seem to pass with this escaped string.
Note that I wouldn't mind using only the ascii lines in the test - the goal is 
to test the planner, not the executor + client.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 23
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 06:55:14 +
Gerrit-HasComments: Yes