[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18001 )

Change subject: IMPALA-10086: SqlCastException when comparing char with varchar
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89
Gerrit-Change-Number: 18001
Gerrit-PatchSet: 2
Gerrit-Owner: Bruno Pusztahazi 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 25 Aug 2023 04:09:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18001 )

Change subject: IMPALA-10086: SqlCastException when comparing char with varchar
..


Patch Set 1:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9632/


--
To view, visit http://gerrit.cloudera.org:8080/18001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89
Gerrit-Change-Number: 18001
Gerrit-PatchSet: 1
Gerrit-Owner: Bruno Pusztahazi 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 25 Aug 2023 03:36:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 30:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13840/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 30
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 25 Aug 2023 02:55:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Zihao Ye (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19699

to look at the new patch set (#30).

Change subject: IMPALA-10798: Initial support for reading JSON files
..

IMPALA-10798: Initial support for reading JSON files

Prototype of HdfsJsonScanner implemented based on rapidjson, which
supports scanning data from splitting json files.

The scanning of JSON data is mainly completed by two parts working
together. The first part is the JsonParser responsible for parsing the
JSON object, which is implemented based on the SAX-style API of
rapidjson. It reads data from the char stream, parses it, and calls the
corresponding callback function when encountering the corresponding JSON
element. See the comments of the JsonParser class for more details.

The other part is the HdfsJsonScanner, which inherits from HdfsScanner
and provides callback functions for the JsonParser. The callback
functions are responsible for providing data buffers to the Parser and
converting and materializing the Parser's parsing results into RowBatch.
It should be noted that the parser returns numeric values as strings to
the scanner. The scanner uses the TextConverter class to convert the
strings to the desired types, similar to how the HdfsTextScanner works.
This is an advantage compared to using number value provided by
rapidjson directly, as it eliminates concerns about inconsistencies in
converting decimals (e.g. losing precision).

Added a startup flag, enable_json_scanner, to be able to disable this
feature if we hit critical bugs in production.

Limitations
 - Multiline json objects are not fully supported yet. It is ok when
   each file has only one scan range. However, when a file has multiple
   scan ranges, there is a small probability of incomplete scanning of
   multiline JSON objects that span ScanRange boundaries (in such cases,
   parsing errors may be reported). For more details, please refer to
   the comments in the 'multiline_json.test'.
 - Compressed JSON files are not supported yet.
 - Complex types are not supported yet.

Tests
 - Most of the existing end-to-end tests can run on JSON format.
 - Add TestQueriesJsonTables in test_queries.py for testing multiline,
   malformed, and overflow in JSON.

Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
---
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-scan-node-base.cc
A be/src/exec/json/CMakeLists.txt
A be/src/exec/json/hdfs-json-scanner.cc
A be/src/exec/json/hdfs-json-scanner.h
A be/src/exec/json/json-parser-test.cc
A be/src/exec/json/json-parser.cc
A be/src/exec/json/json-parser.h
M be/src/exec/text-converter.inline.h
M be/src/util/backend-gflag-util.cc
M bin/rat_exclude_files.txt
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-dependent-tables.sql
A testdata/data/chars-formats.json
A testdata/data/json_test/complex.json
A testdata/data/json_test/malformed.json
A testdata/data/json_test/multiline.json
A testdata/data/json_test/overflow.json
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
A 
testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-json-scan-node-errors.test
A testdata/workloads/functional-query/queries/QueryTest/complex_json.test
A 
testdata/workloads/functional-query/queries/QueryTest/disable-json-scanner.test
A testdata/workloads/functional-query/queries/QueryTest/malformed_json.test
A testdata/workloads/functional-query/queries/QueryTest/multiline_json.test
A testdata/workloads/functional-query/queries/QueryTest/overflow_json.test
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/custom_cluster/test_disable_features.py
M tests/data_errors/test_data_errors.py
M tests/metadata/test_hms_integration.py
M tests/query_test/test_cancellation.py
M tests/query_test/test_chars.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_queries.py
M tests/query_test/test_scanners.py
M tests/query_test/test_scanners_fuzz.py
M 

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(2 comments)

Looks good and I think we are converge toward +2.

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777
PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_;
nit. May add a comment:  useIntermediateTuple_ is set to true for any non merge 
nodes and endsMultiPhase_ is true for a merge node.


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> Unless I misunderstood your example, we can't sort on partition keys
Okay. The test on useIntermediateTuple_ resolves my concern :-).  I think I am 
okay with the change for the distributed plans.

In a serial plan, per 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L475,
 the aggregation node may lost the capability to bail out early. Can we mend it 
somehow as follows?

return isSingleClassAgg() && hasLimit() && hasGrouping() && (is_serial_plan || 
isMultiPhase())
   && !multiAggInfo_.hasAggregateExprs() && getConjuncts().isEmpty();



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:35:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(1 comment)

> Patch Set 7:
>
> (1 comment)

Unless I misunderstood your example, we can't sort on partition keys

> create table foo (s string) partitioned by (a int, b int) sort by (a, b);
Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b)
ERROR: AnalysisException: SORT BY column list must not contain partition 
column: 'a'

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
Unless I misunderstood your example, we can't sort on partition keys

> default> create table foo (s string) partitioned by (a int, b int) sort by 
> (a, b);
Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b)
ERROR: AnalysisException: SORT BY column list must not contain partition 
column: 'a'

If you go through and examine every AggregationNode we create in 
DistributedPlanner, they either setIntermediateTuple or setEndsMultiPhase, so 
this should have no functional difference with DistributedPlanner. The only 
case this should have any impact is when SingleNodePlanner creates a single 
AggregationNode, and there's nothing to push down to.

When I look at examples for a similar case

> create table foo (c int, d int) partitioned by (a int, b int) sort by (c, d);

where I've created 3 partitions and run

> select distinct a, b from foo limit 2

the plans are unchanged with and without this patch

Operator  #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. 
#Rows  Peak Mem  Est. Peak Mem  Detail

-
F02:ROOT   1  1  323.181us  323.181us   
   4.01 MB4.00 MB 
04:EXCHANGE1  1   12.331us   12.331us  2   
2  24.00 KB   16.00 KB  UNPARTITIONED
F01:EXCHANGE SENDER3  3   34.995us   43.647us   
  14.25 KB   48.00 KB
03:AGGREGATE   3  3  407.997us  513.288us  3   
2   2.08 MB   10.00 MB  FINALIZE
02:EXCHANGE3  3   10.555us   14.924us  3   
2  16.00 KB   16.00 KB  HASH(a,b) 
F00:EXCHANGE SENDER3  3   51.210us   60.198us   
  46.69 KB  144.00 KB
01:AGGREGATE   3  3  157.104us  183.497us  3   
2   2.03 MB   10.00 MB  STREAMING
00:SCAN HDFS   3  31.612ms1.962ms  3   
2  32.00 KB   32.00 MB  default.foo

Operator  #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. 
#Rows  Peak Mem  Est. Peak Mem  Detail

-
F02:ROOT   1  1   64.470us   64.470us   
   4.01 MB4.00 MB
04:EXCHANGE1  19.816us9.816us  2   
2  24.00 KB   16.00 KB  UNPARTITIONED
F01:EXCHANGE SENDER3  3   22.339us   23.764us   
  14.28 KB   48.00 KB
03:AGGREGATE   3  3  290.786us  323.453us  3   
2   2.08 MB   10.00 MB  FINALIZE
02:EXCHANGE3  3   16.727us   31.736us  3   
2  24.00 KB   16.00 KB  HASH(a,b)
F00:EXCHANGE SENDER3  3   62.719us   69.838us   
  46.69 KB  144.00 KB   
01:AGGREGATE   3  3  179.794us  196.761us  3   
2   2.03 MB   10.00 MB  STREAMING
00:SCAN HDFS   3  3   40.368ms   42.799ms  3   
2  32.00 KB   32.00 MB  default.foo

Same for SingleNodePlanner. Same results with

> select distinct c, d from foo limit 2



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:10:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20399 )

Change subject: IMPALA-5081: Add codegen_opt_level query option
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166
PS2, Line 166:   bool Empty() { return engine_pointer == nullptr; }
> Not relevant to this change but wouldn't it be better to call Reset() here
Thanks Daniel for pointing this out. It makes sense to me. I think it was 
because previously the struct contained shared_ptr, and somehow needs some hack 
to initialize it, but now it changes to a pointer. It should be okay if all the 
codegen caching tests can pass.



--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:09:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12387: PartialUpdates is misleading for LOCAL filter

2023-08-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20397 )

Change subject: IMPALA-12387: PartialUpdates is misleading for LOCAL filter
..


Patch Set 1: Code-Review+2

(1 comment)

Carrying Wenzhe's +1.

http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@181
PS1, Line 181: LOCAL
> For global filter aggregating in coordinator, I think column "Pending (Expe
Sorry, I mean the size of the IN-list filter, i.e. how many items in the list. 
For MinMax filters, I think we can also get the min/max values. It's an 
improvement that we can do it in future.



--
To view, visit http://gerrit.cloudera.org:8080/20397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8
Gerrit-Change-Number: 20397
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:04:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20399 )

Change subject: IMPALA-5081: Add codegen_opt_level query option
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13839/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:08:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 29: Code-Review+1

Overall LGTM.

Let's add a startup flag, enable_json_scanner (just like enable_orc_scanner 
when we first added the orc-scanner), to be able to disable this feature if we 
hit critical bugs in production.


--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 29
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:02:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18001 )

Change subject: IMPALA-10086: SqlCastException when comparing char with varchar
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9633/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89
Gerrit-Change-Number: 18001
Gerrit-PatchSet: 2
Gerrit-Owner: Bruno Pusztahazi 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:53:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18001 )

Change subject: IMPALA-10086: SqlCastException when comparing char with varchar
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9632/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89
Gerrit-Change-Number: 18001
Gerrit-PatchSet: 1
Gerrit-Owner: Bruno Pusztahazi 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:51:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20399 )

Change subject: IMPALA-5081: Add codegen_opt_level query option
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20399/3/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20399/3/be/src/codegen/llvm-codegen.cc@1340
PS3, Line 1340: COUNTER_SET(num_opt_functions_, 
counter.GetCount(InstructionCounter::TOTAL_FUNCTIONS));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/20399/3/be/src/codegen/llvm-codegen.cc@1343
PS3, Line 1343: COUNTER_SET(num_opt_functions_, 
counter.GetCount(InstructionCounter::TOTAL_FUNCTIONS));
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:49:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20399 )

Change subject: IMPALA-5081: Add codegen_opt_level query option
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc
File be/src/codegen/llvm-codegen-test.cc:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc@594
PS2, Line 594:
> Nit: could be 'nullptr'.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:41:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Michael Smith (Code Review)
Hello Daniel Becker, Yida Wu, Noemi Pap-Takacs, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20399

to look at the new patch set (#3).

Change subject: IMPALA-5081: Add codegen_opt_level query option
..

IMPALA-5081: Add codegen_opt_level query option

Adds the 'codegen_opt_level' query option to select LLVM optimization
level for generated code. Retains the prior behavior - O2 - as default.

If optimization level is changed for an entry already in cache, the
cache entry will be used unless the new optimization level is higher
than the cached level.

Adds unit tests that levels besides O0 inline and optimize a test
function.

Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
---
M be/src/codegen/CMakeLists.txt
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-cache.cc
M be/src/codegen/llvm-codegen-cache.h
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/llvm/test-opt.cc
12 files changed, 300 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/20399/3
--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13838/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:44:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-08-24 Thread Surya Hebbar (Code Review)
Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..


Patch Set 4:

It might be useful to provide the ability to vertically resize the popup charts 
for better responsiveness.

Before introducing further such features and complexities, I was planning to 
integrate a JS testing framework to ensure things are working properly.


--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:21:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13837/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:21:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-08-24 Thread Surya Hebbar (Code Review)
Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..


Patch Set 4:

I have mentioned the comments and problems arising from displaying aggregated 
fragment metrics and periodic metrics together. But, instead of increasing the 
visual complexity of periodic metrics further. As it was suggested to me, it 
might be better to align the gridline with tooltips across multiple popup 
charts, which can be closed or opened for better accessebility. Also, comparing 
multiple fragment's memory metrics and thread usage together in the same chart 
might prove helpful.


--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:20:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-08-24 Thread Surya Hebbar (Code Review)
Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..


Patch Set 4:

For small values of 'periodic_counter_update_period_ms', the cursors across the 
timeline were not displaying tooltips in accordance with hovering due to small 
floating point differences. As there was no API from the c3js library 
available, I looked through the library's source code and have fixed this issue.


--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:20:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-08-24 Thread Surya Hebbar (Code Review)
Surya Hebbar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..

IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

The fragment's plan nodes are enlarged with an animated transition on hovering
over the query timeline's fragment diagram. On clicking the plan nodes,
total thread and memory usage of the parent fragment are displayed,
after accumulating memory and thread usage of all child nodes.

A grid-line is displayed along with a tooltip on hovering over the
fragment diagram, containing the instantaneous time at that position.
This grid-line also triggers tooltips and gridlines in other charts.

The thread usage is being shown on the additional Y-axis.

A warning is displayed on clicking a fragment with less number of samples
available.

RESOURCE_TRACE_RATIO query option provides the utilization values to be
traced within the RuntimeProfile. It contains samples of disk and network
usage on each host. These time series counters are available within
the profile having the following names.

- HostDiskWriteThroughput
- HostDiskReadThroughput
- HostNetworkRx
- HostNetworkTx

The additional Y-axis within the utilization chart is used to represent
the average of these metrics.

The memory units in tooltips and ticks on co-ordinate axes are being
displayed in human readable form such as KB, MB, GB and PB for convenience.

Both of the charts contain controls to close the chart.

Timeticks are being autoscaled during fragment diagram's horizontal zoom.

In addition to the scrollbar, hovering on edges of the window allows
horizontal scrolling.

Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
---
M www/query_timeline.tmpl
M www/scripts/util.js
2 files changed, 580 insertions(+), 154 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/20355/4
--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12231: Bump GBN to get HMS thrift API changes

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20420 )

Change subject: IMPALA-12231: Bump GBN to get HMS thrift API changes
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13836/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3
Gerrit-Change-Number: 20420
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 24 Aug 2023 23:07:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12387: PartialUpdates is misleading for LOCAL filter

2023-08-24 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20397 )

Change subject: IMPALA-12387: PartialUpdates is misleading for LOCAL filter
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@181
PS1, Line 181: LOCAL
> This is ok to me. Just curious, is it doable to get the stats if the fragme
For global filter aggregating in coordinator, I think column "Pending 
(Expected)" is quite representative to show how many total contributors for a 
filter.
For more granular tracing, we'll need to add new field to do the counting at 
least at PublishFilterParamsPB.
Not sure if its worth to hack since it will be only interesting for filters 
that quickly become ALL_TRUE / ALL_FALSE.



--
To view, visit http://gerrit.cloudera.org:8080/20397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8
Gerrit-Change-Number: 20397
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 22:59:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@323
PS13, Line 323:
> flake8: E306 expected 1 blank line before a nested definition, found 0
Ack


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@348
PS13, Line 348:
> flake8: E501 line too long (94 > 90 characters)
Ack


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@353
PS13, Line 353:
> flake8: E501 line too long (93 > 90 characters)
Ack


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@360
PS13, Line 360:
> flake8: E501 line too long (94 > 90 characters)
Ack


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@372
PS13, Line 372:
> flake8: E501 line too long (93 > 90 characters)
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 24 Aug 2023 22:53:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20022

to look at the new patch set (#14).

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..

IMPALA-11535: Skip older events in the event processor based on the
latestRefreshEventID

Summary: If the table has been manually refreshed, all its events
happen before the manual REFRESH can be skipped. This happens when
catalogd is lagging behind in processing events. When processing an
event, we can check whether there are manual REFRESH executed after
its eventTime. In such case, we don't need to process the event to
refresh anything. This helps catalogd to catch up HMS events quickly.

Implementation details: Updated the lastRefreshEventId on the table or
partition whenever there is table or partition level refresh/load.
By comparing the lastRefreshEventId to current event id in the event
processor the older events can be skipped.

set enable_skipping_older_events to true to enable this optimization

Testing:
- Unit end-to-end test and unit test to test the functionality.

Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
11 files changed, 238 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/20022/14
--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12231: Bump GBN to get HMS thrift API changes

2023-08-24 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20420


Change subject: IMPALA-12231: Bump GBN to get HMS thrift API changes
..

IMPALA-12231: Bump GBN to get HMS thrift API changes

We need a couple of hive changes HIVE-27319 and HIVE-27337 for catalogD
to work with latest HMS server to fix IMPALA-11768 and IMPALA-11939
respectively.

Bump CDP_BUILD_NUMBER (GBN) to 44206393
Bump various CDP versiona numbers to be based on 7.2.18.0-273

TESTING: Exhaustive tests ran clean
Added a couple of tests for IMPALA-11939 and IMPALA-11768

Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3
---
M bin/impala-config.sh
M tests/custom_cluster/test_events_custom_configs.py
2 files changed, 35 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/20420/1
--
To view, visit http://gerrit.cloudera.org:8080/20420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3
Gerrit-Change-Number: 20420
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12400: Test expected executors used for planning when no executor groups are healthy

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20419 )

Change subject: IMPALA-12400: Test expected executors used for planning when no 
executor groups are healthy
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13835/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib71ca0a5402c74d07ee875878f092d6d3827c6b7
Gerrit-Change-Number: 20419
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 24 Aug 2023 21:48:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12400: Test expected executors used for planning when no executor groups are healthy

2023-08-24 Thread Abhishek Rawat (Code Review)
Abhishek Rawat has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20419


Change subject: IMPALA-12400: Test expected executors used for planning when no 
executor groups are healthy
..

IMPALA-12400: Test expected executors used for
planning when no executor groups are healthy

Added a custom cluster test for testing number of executors used for
planning when no executor groups are healthy. Planner should use
num executors from 'num_expected_executors' or
'expected_executor_group_sets' when executor groups aren't healthy.

Change-Id: Ib71ca0a5402c74d07ee875878f092d6d3827c6b7
---
M tests/custom_cluster/test_executor_groups.py
1 file changed, 69 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/20419/1
--
To view, visit http://gerrit.cloudera.org:8080/20419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib71ca0a5402c74d07ee875878f092d6d3827c6b7
Gerrit-Change-Number: 20419
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> isMultiPhase encompasses all nodes that are part of a chain of aggregation
I see.

Can you perform a simple performance test to see if this would negatively 
affect queries that a very small subset of non-merge aggregate nodes can 
provide the answer?

For example, let us partition table T on column a, b into 10 partitions and 
sorted on a, b. The query is
select distinct a, b from T limit 2.

Normally, such query can finish as soon as two smallest subsets of rows (on a, 
b) are read in.

By reading the code here, my understand is that with the change we can not 
complete early until on all read nodes (from 10 partitions) are done the work 
and we can complete early only at the very top merge node is active. True?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:46:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13834/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:37:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20022/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20022/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3229
PS12, Line 3229: alte
> nit: long
For Java tests, we can't delay the event processor, so at the event processor 
will process the events once they are generated.
I have add a condition just to verify that lastSyncEventId changes before and 
after.


http://gerrit.cloudera.org:8080/#/c/20022/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3239
PS12, Line 3239: eventsProc
> nit: use assertEquals()
Ack. Sorry for the repetitions. I add it in the previous patch but some how it 
was lost.


http://gerrit.cloudera.org:8080/#/c/20022/12/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/20022/12/tests/custom_cluster/test_events_custom_configs.py@374
PS12, Line 374:   if is_partitoned:
> Can we also test for non-partitioned tables? We can extract a common method
Ack. After adding 4 set of sub tests, the events being skipped is becoming 
flaking, I tried varying polling interval but didn't help much. So I'll just be 
comparing if events_skipped_after is greater than events_skipped_before instead 
of comparing the exact number of skipped events.


http://gerrit.cloudera.org:8080/#/c/20022/12/tests/custom_cluster/test_events_custom_configs.py@384
PS12, Line 384:
> Let's verify no reloads happen by comparing the old and new metrics of "tab
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:14:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@323
PS13, Line 323: d
flake8: E306 expected 1 blank line before a nested definition, found 0


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@348
PS13, Line 348: e
flake8: E501 line too long (94 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@353
PS13, Line 353: d
flake8: E501 line too long (93 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@360
PS13, Line 360: e
flake8: E501 line too long (94 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@372
PS13, Line 372: d
flake8: E501 line too long (93 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:16:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID

2023-08-24 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20022

to look at the new patch set (#13).

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
..

IMPALA-11535: Skip older events in the event processor based on the
latestRefreshEventID

Summary: If the table has been manually refreshed, all its events
happen before the manual REFRESH can be skipped. This happens when
catalogd is lagging behind in processing events. When processing an
event, we can check whether there are manual REFRESH executed after
its eventTime. In such case, we don't need to process the event to
refresh anything. This helps catalogd to catch up HMS events quickly.

Implementation details: Updated the lastRefreshEventId on the table or
partition whenever there is table or partition level refresh/load.
By comparing the lastRefreshEventId to current event id in the event
processor the older events can be skipped.

set enable_skipping_older_events to true to enable this optimization

Testing:
- Unit end-to-end test and unit test to test the functionality.

Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
M tests/util/event_processor_utils.py
12 files changed, 234 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/20022/13
--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 29: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 29
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 24 Aug 2023 18:35:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12395: Override scan cardinality for optimized count star

2023-08-24 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20406 )

Change subject: IMPALA-12395: Override scan cardinality for optimized count star
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id5ce967657208057d50bd80adadac29ebb51cbc5
Gerrit-Change-Number: 20406
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 17:45:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12387: PartialUpdates is misleading for LOCAL filter

2023-08-24 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20397 )

Change subject: IMPALA-12387: PartialUpdates is misleading for LOCAL filter
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8
Gerrit-Change-Number: 20397
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 24 Aug 2023 17:40:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

2023-08-24 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#15) to the change originally created 
by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
..

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying jdbc.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed.
  - Only support binary predicates with operators =, !=, <=, >=,
<, > to be pushed to RDBMS.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure that the database driver package has been added to the
classpath and the minicluster cluster has been started.

2. Copy the data source library into HDFS.
${IMPALA_HOME}/testdata/bin/copy-data-sources.sh

3. Create an `alltypes` table in the postgres database.
${IMPALA_HOME}/testdata/bin/load-data-sources.sh

4. Create data sources table(alltypes_jdbc_datasource).
${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\
  ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql

Testing:
 - Added unit-test for Postgres.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
---
M bin/rat_exclude_files.txt
M 
fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
A java/ext-data-source/jdbc/pom.xml
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java
A 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
A 
java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java
A java/ext-data-source/jdbc/src/test/resources/log4j.properties
A java/ext-data-source/jdbc/src/test/resources/test_script.sql
M java/ext-data-source/pom.xml
M testdata/bin/copy-data-sources.sh
M testdata/bin/create-data-source-table.sql
M testdata/bin/create-load-data.sh
A testdata/bin/load-data-sources.sh
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
30 files changed, 2,084 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/15
--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 15
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20399 )

Change subject: IMPALA-5081: Add codegen_opt_level query option
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166
PS2, Line 166:   void Init() { memset((uint8_t*)this, 0, 
sizeof(CodeGenCacheEntry)); }
> Not relevant to this change but wouldn't it be better to call Reset() here
I don't see any problem with changing to that. I'll let Yida weigh in though.


http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc@1448
PS2, Line 1448:
> Nit: unneeded space.
This is multiplication and preserves the prior formatting. Pretty sure code 
formatter would complain if I removed the space.


http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@845
PS2, Line 845: O1, Os, O2, or O3
> O0 is also a possibility.
Ack


http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@846
PS2, Line 846: Defaults to O2.
> If we ever change the default value we'll probably forget this comment. The
This is a pattern we use a lot of other places here. I'm amenable to this 
argument however.



--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:43:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#7).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.
- passed an exhaustive test run.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 47 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/7
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..

IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

Currently, DictEncoder uses the default hash function for
TimestampValue, which means it is hashing the entire
TimestampValue struct. This can be inconsistent, because
TimestampValue contains some padding that may not be zero
in some cases. For TimestampValues that are part of a Tuple,
the padding is zero, so this is mainly present in test cases.

This was discovered when fixing a Clang Tidy performance-for-range-copy
warning by iterating with a const reference rather than
making a copy of the value. DictTest.TestTimestamps became
flaky with that change, because the hash was no longer
consistent. The copy must have had consistent content for
the padding through the iteration, but the const reference
did not.

This adds a template specialization of the Hash function
for TimestampValue. The specialization uses TimestampValue::Hash(),
which hashes only the non-padding pieces of the struct. This
also includes the change to dict-test.cc that uncovered the
issue. This fix is mostly to unblock IMPALA-12390.

Testing:
 - Ran dict-test in a loop for a few hundred iterations
 - Hand tested inserting many timestamps into a Parquet table
   with dictionary encoding and verified that the performance didn't
   change.

Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Reviewed-on: http://gerrit.cloudera.org:8080/20396
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
Reviewed-by: Michael Smith 
---
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
2 files changed, 8 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Daniel Becker: Looks good to me, but someone else must approve
  Michael Smith: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG@14
PS1, Line 14: the padding is zero, so this is mainly present in test cases.
> I was putting together an initial change for IMPALA-12390, but my GVO run f
Done



--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:30:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9631/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:33:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:26:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20394 )

Change subject: IMPALA-12366: Use 2GB as the default for 
thrift_rpc_max_message_size
..

IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size

Thrift 0.16 implemented a limit on the max message size. In IMPALA-11669,
we added the thrift_rpc_max_message_size parameter and set the default
size to 1GB. Some existing clusters have needed to tune this parameter
higher because their workloads use message sizes larger than 1GB (e.g.
for metadata updates).

Historically, Impala has been able to send and receive 2GB messages,
so this changes the default value for thrift_rpc_max_message_size
to 2GB (INT_MAX). This can be reduced in future when Impala can guarantee
that messages work properly when split up into smaller batches.

TestGracefulShutdown::test_shutdown_idle started failing with this
change, because it is producing a different error message for one
of the negative tests. ClientRequestState::ExecShutdownRequest()
appends some extra explanation when it sees a "Network error" KRPC error,
and the test expects that extra explanation. This modifies
ClientRequestState::ExecShutdownRequest() to provide the extra explanation
for the new error ("Timed out") as well.

Testing:
 - Ran GVO

Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Reviewed-on: http://gerrit.cloudera.org:8080/20394
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 
---
M be/src/rpc/thrift-util.cc
M be/src/service/client-request-state.cc
2 files changed, 10 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Riza Suminto: Looks good to me, approved
  Michael Smith: Looks good to me, but someone else must approve

--
To view, visit http://gerrit.cloudera.org:8080/20394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Gerrit-Change-Number: 20394
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20394 )

Change subject: IMPALA-12366: Use 2GB as the default for 
thrift_rpc_max_message_size
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Gerrit-Change-Number: 20394
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:25:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size

2023-08-24 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20394 )

Change subject: IMPALA-12366: Use 2GB as the default for 
thrift_rpc_max_message_size
..


Patch Set 2: Code-Review+2

Looks good to me. Thank you for watching over this.


--
To view, visit http://gerrit.cloudera.org:8080/20394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Gerrit-Change-Number: 20394
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:20:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11996: Scanner change for Iceberg metadtata querying

2023-08-24 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20010 )

Change subject: IMPALA-11996: Scanner change for Iceberg metadtata querying
..


Patch Set 5:

(64 comments)

Thanks for the patch, Tamas! Does really seem like a lot of work.

I took a first look and I think I found some mem leaks around 
IcebergMetadataScanNode.

I still have to digest the code in IcebergMetadataTableScanner, though. 
Honestly, for me it seems pretty ugly to have JNI call within c++ for literally 
everything. I naively thought that we could somehow let the Java part do the 
Java stuff and the C++ part only meant to ask for the next set of results in 
some format, like thrift.
Even if that's not possible, I think we can give some subtask to the Java part, 
like "please create me the object for the metadata table" and then we can hide 
the majority of the java class/variable/method/type references in the c++ code.
Can't we somehow keep the Java references minimal and let's say maintain the 
iterator that traverses the results, but then ask the Java part to get us the 
actual results giving it the iterator? Could results be passed in thrift or 
some buffer format between the 2 words? Once we got them, we could move the 
values into the row_batch.

I'm curious what others think about this, though.

http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@7
PS5, Line 7: metadtata
typo


http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17
PS5, Line 17: se
typo


http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17
PS5, Line 17: struct column types
it's not just struct but nested types in general


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@20
PS5, Line 20: #include "exec/iceberg-metadata/iceberg-metadata-table-scanner.h"
Would it help to remove this include if we had a forward declaration of 
IcebergMetadataTableScanner in this header file?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@22
PS5, Line 22: #include "runtime/runtime-state.h"
: #include "util/jni-util.h"
same as above


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@41
PS5, Line 41: /// ScanNode ancestor -> ExecNode
I don't think this comment is neccessary


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@42
PS5, Line 42: class IcebergMetadataScanNode : public ScanNode {
Don't you need a virtual destructor for this class?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@49
PS5, Line 49: Iceberg TableScan
What is an Iceberg 'TableScan'? I haven't found any reference in the cc file.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@52
PS5, Line 52:   /// Get next rowbatch from the table scanner
this comment doesn't add much


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@55
PS5, Line 55:   /// Close the Iceberg TableScan
This comment doesn't add much


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@58
PS5, Line 58:   Status GetCatalogTable(JNIEnv* env, jobject* jtable);
private?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@60
PS5, Line 60:  protected:
Are there any derived classes from this one? I haven't found any. What's the 
reason having protected members?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@61
PS5, Line 61: tuple_desc_
nit: I think we use ' char around variable names in comments. like 'tuple_desc_'


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@67
PS5, Line 67: metadtata
typo


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@69
PS5, Line 69:   const string* metadata_table_name_;
does this have to be a pointer? isn't regular string enough?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@73
PS5, Line 73: scoped_ptr
unique_ptr ?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@36
PS5, Line 36: table_name_(new 

[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13833/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 24 Aug 2023 14:54:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Peter Rozsa (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20133

to look at the new patch set (#8).

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 372 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/8
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 29:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9630/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 29
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 24 Aug 2023 14:16:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 22: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 13:06:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Reviewed-on: http://gerrit.cloudera.org:8080/19569
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 988 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13832/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 24 Aug 2023 12:34:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 373 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/7
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option

2023-08-24 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20399 )

Change subject: IMPALA-5081: Add codegen_opt_level query option
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166
PS2, Line 166:   void Init() { memset((uint8_t*)this, 0, 
sizeof(CodeGenCacheEntry)); }
Not relevant to this change but wouldn't it be better to call Reset() here 
instead of memset()? Afaik 'nullptr' is not guaranteed to be represented by the 
0 value.


http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc
File be/src/codegen/llvm-codegen-test.cc:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc@594
PS2, Line 594: NULL
Nit: could be 'nullptr'.


http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc@1448
PS2, Line 1448:
Nit: unneeded space.


http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@845
PS2, Line 845: O1, Os, O2, or O3
O0 is also a possibility.


http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@846
PS2, Line 846: Defaults to O2.
If we ever change the default value we'll probably forget this comment. The 
default value can be seen in common/thrift/Query.thrift, so we don't need to 
write it here.



--
To view, visit http://gerrit.cloudera.org:8080/20399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd
Gerrit-Change-Number: 20399
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 24 Aug 2023 12:03:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 29:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13831/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 29
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 24 Aug 2023 11:36:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 28:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13830/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 28
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 24 Aug 2023 11:29:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 29:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/19699/26//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19699/26//COMMIT_MSG@7
PS26, Line 7: Initial support for reading JSON fi
> Let's change the title to something like "Initial support for reading JSON 
Done


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py
File tests/data_errors/test_data_errors.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py@128
PS23, Line 128: self.run_test_case('DataErrorsTest/hdfs-scan-node-errors', 
vector)
> Can we add a similar test for json?
Done


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py
File tests/query_test/test_cancellation.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py@113
PS23, Line 113: 'text'
> Let's add json here
Done


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py
File tests/query_test/test_chars.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@37
PS23, Line 37: ptions
> Let's test json here
Done


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@68
PS23, Line 68:
> Let's test json here as well
Done


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py
File tests/query_test/test_date_queries.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py@45
PS23, Line 45:
> Let's add json here. Please also update the above comment. DATE type is als
Done



--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 29
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 24 Aug 2023 11:10:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Zihao Ye (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19699

to look at the new patch set (#29).

Change subject: IMPALA-10798: Initial support for reading JSON files
..

IMPALA-10798: Initial support for reading JSON files

Prototype of HdfsJsonScanner implemented based on rapidjson, which
supports scanning data from splitting json files.

The scanning of JSON data is mainly completed by two parts working
together. The first part is the JsonParser responsible for parsing the
JSON object, which is implemented based on the SAX-style API of
rapidjson. It reads data from the char stream, parses it, and calls the
corresponding callback function when encountering the corresponding JSON
element. See the comments of the JsonParser class for more details.

The other part is the HdfsJsonScanner, which inherits from HdfsScanner
and provides callback functions for the JsonParser. The callback
functions are responsible for providing data buffers to the Parser and
converting and materializing the Parser's parsing results into RowBatch.
It should be noted that the parser returns numeric values as strings to
the scanner. The scanner uses the TextConverter class to convert the
strings to the desired types, similar to how the HdfsTextScanner works.
This is an advantage compared to using number value provided by
rapidjson directly, as it eliminates concerns about inconsistencies in
converting decimals (e.g. losing precision).

Limitations
 - Multiline json objects are not fully supported yet. It is ok when
   each file has only one scan range. However, when a file has multiple
   scan ranges, there is a small probability of incomplete scanning of
   multiline JSON objects that span ScanRange boundaries (in such cases,
   parsing errors may be reported). For more details, please refer to
   the comments in the 'multiline_json.test'.
 - Compressed JSON files are not supported yet.
 - Complex types are not supported yet.

Tests
 - Most of the existing end-to-end tests can run on JSON format.
 - Add TestQueriesJsonTables in test_queries.py for testing multiline,
   malformed, and overflow in JSON.

Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
---
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-scan-node-base.cc
A be/src/exec/json/CMakeLists.txt
A be/src/exec/json/hdfs-json-scanner.cc
A be/src/exec/json/hdfs-json-scanner.h
A be/src/exec/json/json-parser-test.cc
A be/src/exec/json/json-parser.cc
A be/src/exec/json/json-parser.h
M be/src/exec/text-converter.inline.h
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-dependent-tables.sql
A testdata/data/chars-formats.json
A testdata/data/json_test/complex.json
A testdata/data/json_test/malformed.json
A testdata/data/json_test/multiline.json
A testdata/data/json_test/overflow.json
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
A 
testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-json-scan-node-errors.test
A testdata/workloads/functional-query/queries/QueryTest/complex_json.test
A testdata/workloads/functional-query/queries/QueryTest/malformed_json.test
A testdata/workloads/functional-query/queries/QueryTest/multiline_json.test
A testdata/workloads/functional-query/queries/QueryTest/overflow_json.test
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/data_errors/test_data_errors.py
M tests/metadata/test_hms_integration.py
M tests/query_test/test_cancellation.py
M tests/query_test/test_chars.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_queries.py
M tests/query_test/test_scanners.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_tpch_queries.py
50 files changed, 1,719 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/19699/29
--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset

[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Zihao Ye (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19699

to look at the new patch set (#28).

Change subject: IMPALA-10798: Initial support for reading JSON files
..

IMPALA-10798: Initial support for reading JSON files

Prototype of HdfsJsonScanner implemented based on rapidjson, which
supports scanning data from splitting json files.

The scanning of JSON data is mainly completed by two parts working
together. The first part is the JsonParser responsible for parsing the
JSON object, which is implemented based on the SAX-style API of
rapidjson. It reads data from the char stream, parses it, and calls the
corresponding callback function when encountering the corresponding JSON
element. See the comments of the JsonParser class for more details.

The other part is the HdfsJsonScanner, which inherits from HdfsScanner
and provides callback functions for the JsonParser. The callback
functions are responsible for providing data buffers to the Parser and
converting and materializing the Parser's parsing results into RowBatch.
It should be noted that the parser returns numeric values as strings to
the scanner. The scanner uses the TextConverter class to convert the
strings to the desired types, similar to how the HdfsTextScanner works.
This is an advantage compared to using number value provided by
rapidjson directly, as it eliminates concerns about inconsistencies in
converting decimals (e.g. losing precision).

Limitations
 - Multiline json objects are not fully supported yet. It is ok when
   each file has only one scan range. However, when a file has multiple
   scan ranges, there is a small probability of incomplete scanning of
   multiline JSON objects that span ScanRange boundaries (in such cases,
   parsing errors may be reported). For more details, please refer to
   the comments in the 'multiline_json.test'.
 - Compressed JSON files are not supported yet.
 - Complex types are not supported yet.

Tests
 - Most of the existing end-to-end tests can run on JSON format.
 - Add TestQueriesJsonTables in test_queries.py for testing multiline,
   malformed, and overflow in JSON.

Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
---
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-scan-node-base.cc
A be/src/exec/json/CMakeLists.txt
A be/src/exec/json/hdfs-json-scanner.cc
A be/src/exec/json/hdfs-json-scanner.h
A be/src/exec/json/json-parser-test.cc
A be/src/exec/json/json-parser.cc
A be/src/exec/json/json-parser.h
M be/src/exec/text-converter.inline.h
M bin/rat_exclude_files.txt
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-dependent-tables.sql
A testdata/data/chars-formats.json
A testdata/data/json_test/complex.json
A testdata/data/json_test/malformed.json
A testdata/data/json_test/multiline.json
A testdata/data/json_test/overflow.json
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
A 
testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-json-scan-node-errors.test
A testdata/workloads/functional-query/queries/QueryTest/complex_json.test
A testdata/workloads/functional-query/queries/QueryTest/malformed_json.test
A testdata/workloads/functional-query/queries/QueryTest/multiline_json.test
A testdata/workloads/functional-query/queries/QueryTest/overflow_json.test
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/data_errors/test_data_errors.py
M tests/metadata/test_hms_integration.py
M tests/query_test/test_cancellation.py
M tests/query_test/test_chars.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_queries.py
M tests/query_test/test_scanners.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_tpch_queries.py
50 files changed, 1,716 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/19699/28
--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset

[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Initial support for reading JSON files
..


Patch Set 28:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19699/28/tests/data_errors/test_data_errors.py
File tests/data_errors/test_data_errors.py:

http://gerrit.cloudera.org:8080/#/c/19699/28/tests/data_errors/test_data_errors.py@162
PS28, Line 162: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/19699/28/tests/query_test/test_chars.py
File tests/query_test/test_chars.py:

http://gerrit.cloudera.org:8080/#/c/19699/28/tests/query_test/test_chars.py@39
PS28, Line 39: a
flake8: W504 line break after binary operator


http://gerrit.cloudera.org:8080/#/c/19699/28/tests/query_test/test_chars.py@83
PS28, Line 83: a
flake8: W504 line break after binary operator



--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 28
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 24 Aug 2023 11:05:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:54:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10120: Add required fields for TGetInfoResp when error.

2023-08-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20344 )

Change subject: IMPALA-10120: Add required fields for TGetInfoResp when error.
..


Patch Set 2:

(1 comment)

Thanks for digging into this! Could you add a test so it won't break in the 
future? E.g. running the following command

beeline -u "jdbc:hive2://localhost:21050/default;auth=noSasl" -e queries

Queries can be "SHOW TABLES" plus some SELECT/CREATE/INSERT/DROP statements.
We can have a test like tests/shell/test_shell_commandline.py, e.g. 
test_beeline.py. Using codes similar to
https://github.com/apache/impala/blob/4b62812995ce380f2dca038bac017432c6c5d14f/tests/common/impala_test_suite.py#L1030-L1045

http://gerrit.cloudera.org:8080/#/c/20344/2/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20344/2/be/src/service/impala-hs2-server.cc@479
PS2, Line 479: return_val.infoValue.__set_stringValue("");
Do we need this for all usages of HS2_RETURN_ERROR() ?



--
To view, visit http://gerrit.cloudera.org:8080/20344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib42bb82735fb4a8e6911b6a19adb8bd84973300b
Gerrit-Change-Number: 20344
Gerrit-PatchSet: 2
Gerrit-Owner: Xiang Yang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:52:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9629/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 22: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13829/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:29:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 988 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/21
--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20394 )

Change subject: IMPALA-12366: Use 2GB as the default for 
thrift_rpc_max_message_size
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13828/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Gerrit-Change-Number: 20394
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:06:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc@298
PS21, Line 298: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
line too long (151 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 08:05:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size

2023-08-24 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20394


Change subject: IMPALA-12366: Use 2GB as the default for 
thrift_rpc_max_message_size
..

IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size

Thrift 0.16 implemented a limit on the max message size. In IMPALA-11669,
we added the thrift_rpc_max_message_size parameter and set the default
size to 1GB. Some existing clusters have needed to tune this parameter
higher because their workloads use message sizes larger than 1GB (e.g.
for metadata updates).

Historically, Impala has been able to send and receive 2GB messages,
so this changes the default value for thrift_rpc_max_message_size
to 2GB (INT_MAX). This can be reduced in future when Impala can guarantee
that messages work properly when split up into smaller batches.

TestGracefulShutdown::test_shutdown_idle started failing with this
change, because it is producing a different error message for one
of the negative tests. ClientRequestState::ExecShutdownRequest()
appends some extra explanation when it sees a "Network error" KRPC error,
and the test expects that extra explanation. This modifies
ClientRequestState::ExecShutdownRequest() to provide the extra explanation
for the new error ("Timed out") as well.

Testing:
 - Ran GVO

Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
---
M be/src/rpc/thrift-util.cc
M be/src/service/client-request-state.cc
2 files changed, 10 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20394/2
--
To view, visit http://gerrit.cloudera.org:8080/20394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Gerrit-Change-Number: 20394
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins