[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18001 ) Change subject: IMPALA-10086: SqlCastException when comparing char with varchar .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89 Gerrit-Change-Number: 18001 Gerrit-PatchSet: 2 Gerrit-Owner: Bruno Pusztahazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 25 Aug 2023 04:09:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18001 ) Change subject: IMPALA-10086: SqlCastException when comparing char with varchar .. Patch Set 1: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9632/ -- To view, visit http://gerrit.cloudera.org:8080/18001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89 Gerrit-Change-Number: 18001 Gerrit-PatchSet: 1 Gerrit-Owner: Bruno Pusztahazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 25 Aug 2023 03:36:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 30: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13840/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 30 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Fri, 25 Aug 2023 02:55:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19699 to look at the new patch set (#30). Change subject: IMPALA-10798: Initial support for reading JSON files .. IMPALA-10798: Initial support for reading JSON files Prototype of HdfsJsonScanner implemented based on rapidjson, which supports scanning data from splitting json files. The scanning of JSON data is mainly completed by two parts working together. The first part is the JsonParser responsible for parsing the JSON object, which is implemented based on the SAX-style API of rapidjson. It reads data from the char stream, parses it, and calls the corresponding callback function when encountering the corresponding JSON element. See the comments of the JsonParser class for more details. The other part is the HdfsJsonScanner, which inherits from HdfsScanner and provides callback functions for the JsonParser. The callback functions are responsible for providing data buffers to the Parser and converting and materializing the Parser's parsing results into RowBatch. It should be noted that the parser returns numeric values as strings to the scanner. The scanner uses the TextConverter class to convert the strings to the desired types, similar to how the HdfsTextScanner works. This is an advantage compared to using number value provided by rapidjson directly, as it eliminates concerns about inconsistencies in converting decimals (e.g. losing precision). Added a startup flag, enable_json_scanner, to be able to disable this feature if we hit critical bugs in production. Limitations - Multiline json objects are not fully supported yet. It is ok when each file has only one scan range. However, when a file has multiple scan ranges, there is a small probability of incomplete scanning of multiline JSON objects that span ScanRange boundaries (in such cases, parsing errors may be reported). For more details, please refer to the comments in the 'multiline_json.test'. - Compressed JSON files are not supported yet. - Complex types are not supported yet. Tests - Most of the existing end-to-end tests can run on JSON format. - Add TestQueriesJsonTables in test_queries.py for testing multiline, malformed, and overflow in JSON. Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 --- M be/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-scan-node-base.cc A be/src/exec/json/CMakeLists.txt A be/src/exec/json/hdfs-json-scanner.cc A be/src/exec/json/hdfs-json-scanner.h A be/src/exec/json/json-parser-test.cc A be/src/exec/json/json-parser.cc A be/src/exec/json/json-parser.h M be/src/exec/text-converter.inline.h M be/src/util/backend-gflag-util.cc M bin/rat_exclude_files.txt M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-dependent-tables.sql A testdata/data/chars-formats.json A testdata/data/json_test/complex.json A testdata/data/json_test/malformed.json A testdata/data/json_test/multiline.json A testdata/data/json_test/overflow.json M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv A testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-json-scan-node-errors.test A testdata/workloads/functional-query/queries/QueryTest/complex_json.test A testdata/workloads/functional-query/queries/QueryTest/disable-json-scanner.test A testdata/workloads/functional-query/queries/QueryTest/malformed_json.test A testdata/workloads/functional-query/queries/QueryTest/multiline_json.test A testdata/workloads/functional-query/queries/QueryTest/overflow_json.test M testdata/workloads/tpcds/tpcds_core.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_core.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/test_dimensions.py M tests/custom_cluster/test_disable_features.py M tests/data_errors/test_data_errors.py M tests/metadata/test_hms_integration.py M tests/query_test/test_cancellation.py M tests/query_test/test_chars.py M tests/query_test/test_date_queries.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_queries.py M tests/query_test/test_scanners.py M tests/query_test/test_scanners_fuzz.py M
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (2 comments) Looks good and I think we are converge toward +2. http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777 PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_; nit. May add a comment: useIntermediateTuple_ is set to true for any non merge nodes and endsMultiPhase_ is true for a merge node. http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > Unless I misunderstood your example, we can't sort on partition keys Okay. The test on useIntermediateTuple_ resolves my concern :-). I think I am okay with the change for the distributed plans. In a serial plan, per https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L475, the aggregation node may lost the capability to bail out early. Can we mend it somehow as follows? return isSingleClassAgg() && hasLimit() && hasGrouping() && (is_serial_plan || isMultiPhase()) && !multiAggInfo_.hasAggregateExprs() && getConjuncts().isEmpty(); -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Aug 2023 00:35:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (1 comment) > Patch Set 7: > > (1 comment) Unless I misunderstood your example, we can't sort on partition keys > create table foo (s string) partitioned by (a int, b int) sort by (a, b); Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b) ERROR: AnalysisException: SORT BY column list must not contain partition column: 'a' http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() Unless I misunderstood your example, we can't sort on partition keys > default> create table foo (s string) partitioned by (a int, b int) sort by > (a, b); Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b) ERROR: AnalysisException: SORT BY column list must not contain partition column: 'a' If you go through and examine every AggregationNode we create in DistributedPlanner, they either setIntermediateTuple or setEndsMultiPhase, so this should have no functional difference with DistributedPlanner. The only case this should have any impact is when SingleNodePlanner creates a single AggregationNode, and there's nothing to push down to. When I look at examples for a similar case > create table foo (c int, d int) partitioned by (a int, b int) sort by (c, d); where I've created 3 partitions and run > select distinct a, b from foo limit 2 the plans are unchanged with and without this patch Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - F02:ROOT 1 1 323.181us 323.181us 4.01 MB4.00 MB 04:EXCHANGE1 1 12.331us 12.331us 2 2 24.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER3 3 34.995us 43.647us 14.25 KB 48.00 KB 03:AGGREGATE 3 3 407.997us 513.288us 3 2 2.08 MB 10.00 MB FINALIZE 02:EXCHANGE3 3 10.555us 14.924us 3 2 16.00 KB 16.00 KB HASH(a,b) F00:EXCHANGE SENDER3 3 51.210us 60.198us 46.69 KB 144.00 KB 01:AGGREGATE 3 3 157.104us 183.497us 3 2 2.03 MB 10.00 MB STREAMING 00:SCAN HDFS 3 31.612ms1.962ms 3 2 32.00 KB 32.00 MB default.foo Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - F02:ROOT 1 1 64.470us 64.470us 4.01 MB4.00 MB 04:EXCHANGE1 19.816us9.816us 2 2 24.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER3 3 22.339us 23.764us 14.28 KB 48.00 KB 03:AGGREGATE 3 3 290.786us 323.453us 3 2 2.08 MB 10.00 MB FINALIZE 02:EXCHANGE3 3 16.727us 31.736us 3 2 24.00 KB 16.00 KB HASH(a,b) F00:EXCHANGE SENDER3 3 62.719us 69.838us 46.69 KB 144.00 KB 01:AGGREGATE 3 3 179.794us 196.761us 3 2 2.03 MB 10.00 MB STREAMING 00:SCAN HDFS 3 3 40.368ms 42.799ms 3 2 32.00 KB 32.00 MB default.foo Same for SingleNodePlanner. Same results with > select distinct c, d from foo limit 2 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Aug 2023 00:10:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166 PS2, Line 166: bool Empty() { return engine_pointer == nullptr; } > Not relevant to this change but wouldn't it be better to call Reset() here Thanks Daniel for pointing this out. It makes sense to me. I think it was because previously the struct contained shared_ptr, and somehow needs some hack to initialize it, but now it changes to a pointer. It should be okay if all the codegen caching tests can pass. -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 25 Aug 2023 00:09:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12387: PartialUpdates is misleading for LOCAL filter
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20397 ) Change subject: IMPALA-12387: PartialUpdates is misleading for LOCAL filter .. Patch Set 1: Code-Review+2 (1 comment) Carrying Wenzhe's +1. http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test: http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@181 PS1, Line 181: LOCAL > For global filter aggregating in coordinator, I think column "Pending (Expe Sorry, I mean the size of the IN-list filter, i.e. how many items in the list. For MinMax filters, I think we can also get the min/max values. It's an improvement that we can do it in future. -- To view, visit http://gerrit.cloudera.org:8080/20397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8 Gerrit-Change-Number: 20397 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 25 Aug 2023 00:04:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13839/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 25 Aug 2023 00:08:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 29: Code-Review+1 Overall LGTM. Let's add a startup flag, enable_json_scanner (just like enable_orc_scanner when we first added the orc-scanner), to be able to disable this feature if we hit critical bugs in production. -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 29 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Fri, 25 Aug 2023 00:02:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18001 ) Change subject: IMPALA-10086: SqlCastException when comparing char with varchar .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9633/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89 Gerrit-Change-Number: 18001 Gerrit-PatchSet: 2 Gerrit-Owner: Bruno Pusztahazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 24 Aug 2023 23:53:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10086: SqlCastException when comparing char with varchar
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18001 ) Change subject: IMPALA-10086: SqlCastException when comparing char with varchar .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9632/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3cd331ae1b6afb778a88080efd38900694539a89 Gerrit-Change-Number: 18001 Gerrit-PatchSet: 1 Gerrit-Owner: Bruno Pusztahazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 24 Aug 2023 23:51:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/20399/3/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20399/3/be/src/codegen/llvm-codegen.cc@1340 PS3, Line 1340: COUNTER_SET(num_opt_functions_, counter.GetCount(InstructionCounter::TOTAL_FUNCTIONS)); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/20399/3/be/src/codegen/llvm-codegen.cc@1343 PS3, Line 1343: COUNTER_SET(num_opt_functions_, counter.GetCount(InstructionCounter::TOTAL_FUNCTIONS)); line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 24 Aug 2023 23:49:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc@594 PS2, Line 594: > Nit: could be 'nullptr'. Done -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 24 Aug 2023 23:41:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Hello Daniel Becker, Yida Wu, Noemi Pap-Takacs, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20399 to look at the new patch set (#3). Change subject: IMPALA-5081: Add codegen_opt_level query option .. IMPALA-5081: Add codegen_opt_level query option Adds the 'codegen_opt_level' query option to select LLVM optimization level for generated code. Retains the prior behavior - O2 - as default. If optimization level is changed for an entry already in cache, the cache entry will be used unless the new optimization level is higher than the cached level. Adds unit tests that levels besides O0 inline and optimize a test function. Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd --- M be/src/codegen/CMakeLists.txt M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-cache.cc M be/src/codegen/llvm-codegen-cache.h M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A testdata/llvm/test-opt.cc 12 files changed, 300 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/20399/3 -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13838/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 23:44:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Surya Hebbar has posted comments on this change. ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. Patch Set 4: It might be useful to provide the ability to vertically resize the popup charts for better responsiveness. Before introducing further such features and complexities, I was planning to integrate a JS testing framework to ensure things are working properly. -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 23:21:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 ) Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13837/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 24 Aug 2023 23:21:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Surya Hebbar has posted comments on this change. ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. Patch Set 4: I have mentioned the comments and problems arising from displaying aggregated fragment metrics and periodic metrics together. But, instead of increasing the visual complexity of periodic metrics further. As it was suggested to me, it might be better to align the gridline with tooltips across multiple popup charts, which can be closed or opened for better accessebility. Also, comparing multiple fragment's memory metrics and thread usage together in the same chart might prove helpful. -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 23:20:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Surya Hebbar has posted comments on this change. ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. Patch Set 4: For small values of 'periodic_counter_update_period_ms', the cursors across the timeline were not displaying tooltips in accordance with hovering due to small floating point differences. As there was no API from the c3js library available, I looked through the library's source code and have fixed this issue. -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 23:20:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Surya Hebbar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline The fragment's plan nodes are enlarged with an animated transition on hovering over the query timeline's fragment diagram. On clicking the plan nodes, total thread and memory usage of the parent fragment are displayed, after accumulating memory and thread usage of all child nodes. A grid-line is displayed along with a tooltip on hovering over the fragment diagram, containing the instantaneous time at that position. This grid-line also triggers tooltips and gridlines in other charts. The thread usage is being shown on the additional Y-axis. A warning is displayed on clicking a fragment with less number of samples available. RESOURCE_TRACE_RATIO query option provides the utilization values to be traced within the RuntimeProfile. It contains samples of disk and network usage on each host. These time series counters are available within the profile having the following names. - HostDiskWriteThroughput - HostDiskReadThroughput - HostNetworkRx - HostNetworkTx The additional Y-axis within the utilization chart is used to represent the average of these metrics. The memory units in tooltips and ticks on co-ordinate axes are being displayed in human readable form such as KB, MB, GB and PB for convenience. Both of the charts contain controls to close the chart. Timeticks are being autoscaled during fragment diagram's horizontal zoom. In addition to the scrollbar, hovering on edges of the window allows horizontal scrolling. Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f --- M www/query_timeline.tmpl M www/scripts/util.js 2 files changed, 580 insertions(+), 154 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/20355/4 -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12231: Bump GBN to get HMS thrift API changes
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20420 ) Change subject: IMPALA-12231: Bump GBN to get HMS thrift API changes .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13836/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3 Gerrit-Change-Number: 20420 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 24 Aug 2023 23:07:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12387: PartialUpdates is misleading for LOCAL filter
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20397 ) Change subject: IMPALA-12387: PartialUpdates is misleading for LOCAL filter .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test: http://gerrit.cloudera.org:8080/#/c/20397/1/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@181 PS1, Line 181: LOCAL > This is ok to me. Just curious, is it doable to get the stats if the fragme For global filter aggregating in coordinator, I think column "Pending (Expected)" is quite representative to show how many total contributors for a filter. For more granular tracing, we'll need to add new field to do the counting at least at PublishFilterParamsPB. Not sure if its worth to hack since it will be only interesting for filters that quickly become ALL_TRUE / ALL_FALSE. -- To view, visit http://gerrit.cloudera.org:8080/20397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8 Gerrit-Change-Number: 20397 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 22:59:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 ) Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. Patch Set 14: (5 comments) http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@323 PS13, Line 323: > flake8: E306 expected 1 blank line before a nested definition, found 0 Ack http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@348 PS13, Line 348: > flake8: E501 line too long (94 > 90 characters) Ack http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@353 PS13, Line 353: > flake8: E501 line too long (93 > 90 characters) Ack http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@360 PS13, Line 360: > flake8: E501 line too long (94 > 90 characters) Ack http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@372 PS13, Line 372: > flake8: E501 line too long (93 > 90 characters) Ack -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 24 Aug 2023 22:53:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Hello Quanlong Huang, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20022 to look at the new patch set (#14). Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID Summary: If the table has been manually refreshed, all its events happen before the manual REFRESH can be skipped. This happens when catalogd is lagging behind in processing events. When processing an event, we can check whether there are manual REFRESH executed after its eventTime. In such case, we don't need to process the event to refresh anything. This helps catalogd to catch up HMS events quickly. Implementation details: Updated the lastRefreshEventId on the table or partition whenever there is table or partition level refresh/load. By comparing the lastRefreshEventId to current event id in the event processor the older events can be skipped. set enable_skipping_older_events to true to enable this optimization Testing: - Unit end-to-end test and unit test to test the functionality. Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M tests/custom_cluster/test_events_custom_configs.py 11 files changed, 238 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/20022/14 -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12231: Bump GBN to get HMS thrift API changes
Sai Hemanth Gantasala has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20420 Change subject: IMPALA-12231: Bump GBN to get HMS thrift API changes .. IMPALA-12231: Bump GBN to get HMS thrift API changes We need a couple of hive changes HIVE-27319 and HIVE-27337 for catalogD to work with latest HMS server to fix IMPALA-11768 and IMPALA-11939 respectively. Bump CDP_BUILD_NUMBER (GBN) to 44206393 Bump various CDP versiona numbers to be based on 7.2.18.0-273 TESTING: Exhaustive tests ran clean Added a couple of tests for IMPALA-11939 and IMPALA-11768 Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3 --- M bin/impala-config.sh M tests/custom_cluster/test_events_custom_configs.py 2 files changed, 35 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/20420/1 -- To view, visit http://gerrit.cloudera.org:8080/20420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I117873b628aed3e24280f9fcd79643f918c8d5f3 Gerrit-Change-Number: 20420 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12400: Test expected executors used for planning when no executor groups are healthy
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20419 ) Change subject: IMPALA-12400: Test expected executors used for planning when no executor groups are healthy .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13835/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib71ca0a5402c74d07ee875878f092d6d3827c6b7 Gerrit-Change-Number: 20419 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 24 Aug 2023 21:48:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12400: Test expected executors used for planning when no executor groups are healthy
Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20419 Change subject: IMPALA-12400: Test expected executors used for planning when no executor groups are healthy .. IMPALA-12400: Test expected executors used for planning when no executor groups are healthy Added a custom cluster test for testing number of executors used for planning when no executor groups are healthy. Planner should use num executors from 'num_expected_executors' or 'expected_executor_group_sets' when executor groups aren't healthy. Change-Id: Ib71ca0a5402c74d07ee875878f092d6d3827c6b7 --- M tests/custom_cluster/test_executor_groups.py 1 file changed, 69 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/20419/1 -- To view, visit http://gerrit.cloudera.org:8080/20419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib71ca0a5402c74d07ee875878f092d6d3827c6b7 Gerrit-Change-Number: 20419 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > isMultiPhase encompasses all nodes that are part of a chain of aggregation I see. Can you perform a simple performance test to see if this would negatively affect queries that a very small subset of non-merge aggregate nodes can provide the answer? For example, let us partition table T on column a, b into 10 partitions and sorted on a, b. The query is select distinct a, b from T limit 2. Normally, such query can finish as soon as two smallest subsets of rows (on a, b) are read in. By reading the code here, my understand is that with the change we can not complete early until on all read nodes (from 10 partitions) are done the work and we can complete early only at the very top merge node is active. True? -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 20:46:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 ) Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13834/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 24 Aug 2023 20:37:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 ) Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/20022/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20022/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3229 PS12, Line 3229: alte > nit: long For Java tests, we can't delay the event processor, so at the event processor will process the events once they are generated. I have add a condition just to verify that lastSyncEventId changes before and after. http://gerrit.cloudera.org:8080/#/c/20022/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3239 PS12, Line 3239: eventsProc > nit: use assertEquals() Ack. Sorry for the repetitions. I add it in the previous patch but some how it was lost. http://gerrit.cloudera.org:8080/#/c/20022/12/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20022/12/tests/custom_cluster/test_events_custom_configs.py@374 PS12, Line 374: if is_partitoned: > Can we also test for non-partitioned tables? We can extract a common method Ack. After adding 4 set of sub tests, the events being skipped is becoming flaking, I tried varying polling interval but didn't help much. So I'll just be comparing if events_skipped_after is greater than events_skipped_before instead of comparing the exact number of skipped events. http://gerrit.cloudera.org:8080/#/c/20022/12/tests/custom_cluster/test_events_custom_configs.py@384 PS12, Line 384: > Let's verify no reloads happen by comparing the old and new metrics of "tab Ack -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 24 Aug 2023 20:14:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 ) Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@323 PS13, Line 323: d flake8: E306 expected 1 blank line before a nested definition, found 0 http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@348 PS13, Line 348: e flake8: E501 line too long (94 > 90 characters) http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@353 PS13, Line 353: d flake8: E501 line too long (93 > 90 characters) http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@360 PS13, Line 360: e flake8: E501 line too long (94 > 90 characters) http://gerrit.cloudera.org:8080/#/c/20022/13/tests/custom_cluster/test_events_custom_configs.py@372 PS13, Line 372: d flake8: E501 line too long (93 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 24 Aug 2023 20:16:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID
Hello Quanlong Huang, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20022 to look at the new patch set (#13). Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID .. IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID Summary: If the table has been manually refreshed, all its events happen before the manual REFRESH can be skipped. This happens when catalogd is lagging behind in processing events. When processing an event, we can check whether there are manual REFRESH executed after its eventTime. In such case, we don't need to process the event to refresh anything. This helps catalogd to catch up HMS events quickly. Implementation details: Updated the lastRefreshEventId on the table or partition whenever there is table or partition level refresh/load. By comparing the lastRefreshEventId to current event id in the event processor the older events can be skipped. set enable_skipping_older_events to true to enable this optimization Testing: - Unit end-to-end test and unit test to test the functionality. Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M tests/custom_cluster/test_events_custom_configs.py M tests/util/event_processor_utils.py 12 files changed, 234 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/20022/13 -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 29: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 29 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 24 Aug 2023 18:35:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12395: Override scan cardinality for optimized count star
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20406 ) Change subject: IMPALA-12395: Override scan cardinality for optimized count star .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id5ce967657208057d50bd80adadac29ebb51cbc5 Gerrit-Change-Number: 20406 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 17:45:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12387: PartialUpdates is misleading for LOCAL filter
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20397 ) Change subject: IMPALA-12387: PartialUpdates is misleading for LOCAL filter .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56078a458799671246ff90b831e5ecebd04a78e8 Gerrit-Change-Number: 20397 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 24 Aug 2023 17:40:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables
Wenzhe Zhou has uploaded a new patch set (#15) to the change originally created by Fucun Chu. ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables .. IMPALA-5741: Initial support for reading tiny RDBMS tables This patch uses the "external data source" mechanism in Impala to implement data source for querying jdbc. It has some limitations due to the restrictions of "external data source": - It is not distributed. - Only support binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS. Source files under jdbc/conf, jdbc/dao and jdbc/exception are replicated from Hive JDBC Storage Handler. In order to query the RDBMS tables, the following steps should be followed (note that existing data source table will be rebuilt): 1. Make sure that the database driver package has been added to the classpath and the minicluster cluster has been started. 2. Copy the data source library into HDFS. ${IMPALA_HOME}/testdata/bin/copy-data-sources.sh 3. Create an `alltypes` table in the postgres database. ${IMPALA_HOME}/testdata/bin/load-data-sources.sh 4. Create data sources table(alltypes_jdbc_datasource). ${IMPALA_HOME}/bin/impala-shell.sh -i ${IMPALAD} -f\ ${IMPALA_HOME}/testdata/bin/create-data-source-table.sql Testing: - Added unit-test for Postgres. - Ran core tests successfully. Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 --- M bin/rat_exclude_files.txt M fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java A java/ext-data-source/jdbc/pom.xml A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/README.md A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfig.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DB2DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessorFactory.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MsSqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/MySqlDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/OracleDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/PostgresDatabaseAccessor.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/exception/JdbcDatabaseAccessException.java A java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java A java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java A java/ext-data-source/jdbc/src/test/resources/log4j.properties A java/ext-data-source/jdbc/src/test/resources/test_script.sql M java/ext-data-source/pom.xml M testdata/bin/copy-data-sources.sh M testdata/bin/create-data-source-table.sql M testdata/bin/create-load-data.sh A testdata/bin/load-data-sources.sh M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test 30 files changed, 2,084 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/17842/15 -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 15 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166 PS2, Line 166: void Init() { memset((uint8_t*)this, 0, sizeof(CodeGenCacheEntry)); } > Not relevant to this change but wouldn't it be better to call Reset() here I don't see any problem with changing to that. I'll let Yida weigh in though. http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc@1448 PS2, Line 1448: > Nit: unneeded space. This is multiplication and preserves the prior formatting. Pretty sure code formatter would complain if I removed the space. http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@845 PS2, Line 845: O1, Os, O2, or O3 > O0 is also a possibility. Ack http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@846 PS2, Line 846: Defaults to O2. > If we ever change the default value we'll probably forget this comment. The This is a pattern we use a lot of other places here. I'm amenable to this argument however. -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 24 Aug 2023 16:43:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#7). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. - passed an exhaustive test run. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 47 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/7 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20396 ) Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder .. IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder Currently, DictEncoder uses the default hash function for TimestampValue, which means it is hashing the entire TimestampValue struct. This can be inconsistent, because TimestampValue contains some padding that may not be zero in some cases. For TimestampValues that are part of a Tuple, the padding is zero, so this is mainly present in test cases. This was discovered when fixing a Clang Tidy performance-for-range-copy warning by iterating with a const reference rather than making a copy of the value. DictTest.TestTimestamps became flaky with that change, because the hash was no longer consistent. The copy must have had consistent content for the padding through the iteration, but the const reference did not. This adds a template specialization of the Hash function for TimestampValue. The specialization uses TimestampValue::Hash(), which hashes only the non-padding pieces of the struct. This also includes the change to dict-test.cc that uncovered the issue. This fix is mostly to unblock IMPALA-12390. Testing: - Ran dict-test in a loop for a few hundred iterations - Hand tested inserting many timestamps into a Parquet table with dictionary encoding and verified that the performance didn't change. Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9 Reviewed-on: http://gerrit.cloudera.org:8080/20396 Tested-by: Impala Public Jenkins Reviewed-by: Daniel Becker Reviewed-by: Michael Smith --- M be/src/util/dict-encoding.h M be/src/util/dict-test.cc 2 files changed, 8 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Daniel Becker: Looks good to me, but someone else must approve Michael Smith: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/20396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9 Gerrit-Change-Number: 20396 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20396 ) Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG@14 PS1, Line 14: the padding is zero, so this is mainly present in test cases. > I was putting together an initial change for IMPALA-12390, but my GVO run f Done -- To view, visit http://gerrit.cloudera.org:8080/20396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9 Gerrit-Change-Number: 20396 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 24 Aug 2023 16:30:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9631/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 16:33:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20396 ) Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9 Gerrit-Change-Number: 20396 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 24 Aug 2023 16:26:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20394 ) Change subject: IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size .. IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size Thrift 0.16 implemented a limit on the max message size. In IMPALA-11669, we added the thrift_rpc_max_message_size parameter and set the default size to 1GB. Some existing clusters have needed to tune this parameter higher because their workloads use message sizes larger than 1GB (e.g. for metadata updates). Historically, Impala has been able to send and receive 2GB messages, so this changes the default value for thrift_rpc_max_message_size to 2GB (INT_MAX). This can be reduced in future when Impala can guarantee that messages work properly when split up into smaller batches. TestGracefulShutdown::test_shutdown_idle started failing with this change, because it is producing a different error message for one of the negative tests. ClientRequestState::ExecShutdownRequest() appends some extra explanation when it sees a "Network error" KRPC error, and the test expects that extra explanation. This modifies ClientRequestState::ExecShutdownRequest() to provide the extra explanation for the new error ("Timed out") as well. Testing: - Ran GVO Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc Reviewed-on: http://gerrit.cloudera.org:8080/20394 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto Reviewed-by: Michael Smith --- M be/src/rpc/thrift-util.cc M be/src/service/client-request-state.cc 2 files changed, 10 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Verified Riza Suminto: Looks good to me, approved Michael Smith: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/20394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc Gerrit-Change-Number: 20394 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20394 ) Change subject: IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc Gerrit-Change-Number: 20394 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 24 Aug 2023 16:25:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20394 ) Change subject: IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size .. Patch Set 2: Code-Review+2 Looks good to me. Thank you for watching over this. -- To view, visit http://gerrit.cloudera.org:8080/20394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc Gerrit-Change-Number: 20394 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 24 Aug 2023 16:20:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11996: Scanner change for Iceberg metadtata querying
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20010 ) Change subject: IMPALA-11996: Scanner change for Iceberg metadtata querying .. Patch Set 5: (64 comments) Thanks for the patch, Tamas! Does really seem like a lot of work. I took a first look and I think I found some mem leaks around IcebergMetadataScanNode. I still have to digest the code in IcebergMetadataTableScanner, though. Honestly, for me it seems pretty ugly to have JNI call within c++ for literally everything. I naively thought that we could somehow let the Java part do the Java stuff and the C++ part only meant to ask for the next set of results in some format, like thrift. Even if that's not possible, I think we can give some subtask to the Java part, like "please create me the object for the metadata table" and then we can hide the majority of the java class/variable/method/type references in the c++ code. Can't we somehow keep the Java references minimal and let's say maintain the iterator that traverses the results, but then ask the Java part to get us the actual results giving it the iterator? Could results be passed in thrift or some buffer format between the 2 words? Once we got them, we could move the values into the row_batch. I'm curious what others think about this, though. http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@7 PS5, Line 7: metadtata typo http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17 PS5, Line 17: se typo http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17 PS5, Line 17: struct column types it's not just struct but nested types in general http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@20 PS5, Line 20: #include "exec/iceberg-metadata/iceberg-metadata-table-scanner.h" Would it help to remove this include if we had a forward declaration of IcebergMetadataTableScanner in this header file? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@22 PS5, Line 22: #include "runtime/runtime-state.h" : #include "util/jni-util.h" same as above http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@41 PS5, Line 41: /// ScanNode ancestor -> ExecNode I don't think this comment is neccessary http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@42 PS5, Line 42: class IcebergMetadataScanNode : public ScanNode { Don't you need a virtual destructor for this class? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@49 PS5, Line 49: Iceberg TableScan What is an Iceberg 'TableScan'? I haven't found any reference in the cc file. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@52 PS5, Line 52: /// Get next rowbatch from the table scanner this comment doesn't add much http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@55 PS5, Line 55: /// Close the Iceberg TableScan This comment doesn't add much http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@58 PS5, Line 58: Status GetCatalogTable(JNIEnv* env, jobject* jtable); private? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@60 PS5, Line 60: protected: Are there any derived classes from this one? I haven't found any. What's the reason having protected members? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@61 PS5, Line 61: tuple_desc_ nit: I think we use ' char around variable names in comments. like 'tuple_desc_' http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@67 PS5, Line 67: metadtata typo http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@69 PS5, Line 69: const string* metadata_table_name_; does this have to be a pointer? isn't regular string enough? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@73 PS5, Line 73: scoped_ptr unique_ptr ? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@36 PS5, Line 36: table_name_(new
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13833/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 8 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 24 Aug 2023 14:54:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20133 to look at the new patch set (#8). Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 372 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/8 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 8 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 29: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9630/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 29 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 24 Aug 2023 14:16:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 22: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 13:06:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Reviewed-on: http://gerrit.cloudera.org:8080/19569 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 988 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13832/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 24 Aug 2023 12:34:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 373 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/7 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166 PS2, Line 166: void Init() { memset((uint8_t*)this, 0, sizeof(CodeGenCacheEntry)); } Not relevant to this change but wouldn't it be better to call Reset() here instead of memset()? Afaik 'nullptr' is not guaranteed to be represented by the 0 value. http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-test.cc@594 PS2, Line 594: NULL Nit: could be 'nullptr'. http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen.cc@1448 PS2, Line 1448: Nit: unneeded space. http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@845 PS2, Line 845: O1, Os, O2, or O3 O0 is also a possibility. http://gerrit.cloudera.org:8080/#/c/20399/2/common/thrift/ImpalaService.thrift@846 PS2, Line 846: Defaults to O2. If we ever change the default value we'll probably forget this comment. The default value can be seen in common/thrift/Query.thrift, so we don't need to write it here. -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 24 Aug 2023 12:03:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 29: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13831/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 29 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 24 Aug 2023 11:36:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 28: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13830/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 28 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 24 Aug 2023 11:29:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 29: (6 comments) http://gerrit.cloudera.org:8080/#/c/19699/26//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19699/26//COMMIT_MSG@7 PS26, Line 7: Initial support for reading JSON fi > Let's change the title to something like "Initial support for reading JSON Done http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py File tests/data_errors/test_data_errors.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py@128 PS23, Line 128: self.run_test_case('DataErrorsTest/hdfs-scan-node-errors', vector) > Can we add a similar test for json? Done http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py@113 PS23, Line 113: 'text' > Let's add json here Done http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py File tests/query_test/test_chars.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@37 PS23, Line 37: ptions > Let's test json here Done http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@68 PS23, Line 68: > Let's test json here as well Done http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py File tests/query_test/test_date_queries.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py@45 PS23, Line 45: > Let's add json here. Please also update the above comment. DATE type is als Done -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 29 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 24 Aug 2023 11:10:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19699 to look at the new patch set (#29). Change subject: IMPALA-10798: Initial support for reading JSON files .. IMPALA-10798: Initial support for reading JSON files Prototype of HdfsJsonScanner implemented based on rapidjson, which supports scanning data from splitting json files. The scanning of JSON data is mainly completed by two parts working together. The first part is the JsonParser responsible for parsing the JSON object, which is implemented based on the SAX-style API of rapidjson. It reads data from the char stream, parses it, and calls the corresponding callback function when encountering the corresponding JSON element. See the comments of the JsonParser class for more details. The other part is the HdfsJsonScanner, which inherits from HdfsScanner and provides callback functions for the JsonParser. The callback functions are responsible for providing data buffers to the Parser and converting and materializing the Parser's parsing results into RowBatch. It should be noted that the parser returns numeric values as strings to the scanner. The scanner uses the TextConverter class to convert the strings to the desired types, similar to how the HdfsTextScanner works. This is an advantage compared to using number value provided by rapidjson directly, as it eliminates concerns about inconsistencies in converting decimals (e.g. losing precision). Limitations - Multiline json objects are not fully supported yet. It is ok when each file has only one scan range. However, when a file has multiple scan ranges, there is a small probability of incomplete scanning of multiline JSON objects that span ScanRange boundaries (in such cases, parsing errors may be reported). For more details, please refer to the comments in the 'multiline_json.test'. - Compressed JSON files are not supported yet. - Complex types are not supported yet. Tests - Most of the existing end-to-end tests can run on JSON format. - Add TestQueriesJsonTables in test_queries.py for testing multiline, malformed, and overflow in JSON. Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 --- M be/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-scan-node-base.cc A be/src/exec/json/CMakeLists.txt A be/src/exec/json/hdfs-json-scanner.cc A be/src/exec/json/hdfs-json-scanner.h A be/src/exec/json/json-parser-test.cc A be/src/exec/json/json-parser.cc A be/src/exec/json/json-parser.h M be/src/exec/text-converter.inline.h M bin/rat_exclude_files.txt M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-dependent-tables.sql A testdata/data/chars-formats.json A testdata/data/json_test/complex.json A testdata/data/json_test/malformed.json A testdata/data/json_test/multiline.json A testdata/data/json_test/overflow.json M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv A testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-json-scan-node-errors.test A testdata/workloads/functional-query/queries/QueryTest/complex_json.test A testdata/workloads/functional-query/queries/QueryTest/malformed_json.test A testdata/workloads/functional-query/queries/QueryTest/multiline_json.test A testdata/workloads/functional-query/queries/QueryTest/overflow_json.test M testdata/workloads/tpcds/tpcds_core.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_core.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/test_dimensions.py M tests/data_errors/test_data_errors.py M tests/metadata/test_hms_integration.py M tests/query_test/test_cancellation.py M tests/query_test/test_chars.py M tests/query_test/test_date_queries.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_queries.py M tests/query_test/test_scanners.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_tpch_queries.py 50 files changed, 1,719 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/19699/29 -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19699 to look at the new patch set (#28). Change subject: IMPALA-10798: Initial support for reading JSON files .. IMPALA-10798: Initial support for reading JSON files Prototype of HdfsJsonScanner implemented based on rapidjson, which supports scanning data from splitting json files. The scanning of JSON data is mainly completed by two parts working together. The first part is the JsonParser responsible for parsing the JSON object, which is implemented based on the SAX-style API of rapidjson. It reads data from the char stream, parses it, and calls the corresponding callback function when encountering the corresponding JSON element. See the comments of the JsonParser class for more details. The other part is the HdfsJsonScanner, which inherits from HdfsScanner and provides callback functions for the JsonParser. The callback functions are responsible for providing data buffers to the Parser and converting and materializing the Parser's parsing results into RowBatch. It should be noted that the parser returns numeric values as strings to the scanner. The scanner uses the TextConverter class to convert the strings to the desired types, similar to how the HdfsTextScanner works. This is an advantage compared to using number value provided by rapidjson directly, as it eliminates concerns about inconsistencies in converting decimals (e.g. losing precision). Limitations - Multiline json objects are not fully supported yet. It is ok when each file has only one scan range. However, when a file has multiple scan ranges, there is a small probability of incomplete scanning of multiline JSON objects that span ScanRange boundaries (in such cases, parsing errors may be reported). For more details, please refer to the comments in the 'multiline_json.test'. - Compressed JSON files are not supported yet. - Complex types are not supported yet. Tests - Most of the existing end-to-end tests can run on JSON format. - Add TestQueriesJsonTables in test_queries.py for testing multiline, malformed, and overflow in JSON. Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 --- M be/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-scan-node-base.cc A be/src/exec/json/CMakeLists.txt A be/src/exec/json/hdfs-json-scanner.cc A be/src/exec/json/hdfs-json-scanner.h A be/src/exec/json/json-parser-test.cc A be/src/exec/json/json-parser.cc A be/src/exec/json/json-parser.h M be/src/exec/text-converter.inline.h M bin/rat_exclude_files.txt M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-dependent-tables.sql A testdata/data/chars-formats.json A testdata/data/json_test/complex.json A testdata/data/json_test/malformed.json A testdata/data/json_test/multiline.json A testdata/data/json_test/overflow.json M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv A testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-json-scan-node-errors.test A testdata/workloads/functional-query/queries/QueryTest/complex_json.test A testdata/workloads/functional-query/queries/QueryTest/malformed_json.test A testdata/workloads/functional-query/queries/QueryTest/multiline_json.test A testdata/workloads/functional-query/queries/QueryTest/overflow_json.test M testdata/workloads/tpcds/tpcds_core.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_core.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/test_dimensions.py M tests/data_errors/test_data_errors.py M tests/metadata/test_hms_integration.py M tests/query_test/test_cancellation.py M tests/query_test/test_chars.py M tests/query_test/test_date_queries.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_queries.py M tests/query_test/test_scanners.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_tpch_queries.py 50 files changed, 1,716 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/19699/28 -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset
[Impala-ASF-CR] IMPALA-10798: Initial support for reading JSON files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 ) Change subject: IMPALA-10798: Initial support for reading JSON files .. Patch Set 28: (3 comments) http://gerrit.cloudera.org:8080/#/c/19699/28/tests/data_errors/test_data_errors.py File tests/data_errors/test_data_errors.py: http://gerrit.cloudera.org:8080/#/c/19699/28/tests/data_errors/test_data_errors.py@162 PS28, Line 162: \ flake8: E502 the backslash is redundant between brackets http://gerrit.cloudera.org:8080/#/c/19699/28/tests/query_test/test_chars.py File tests/query_test/test_chars.py: http://gerrit.cloudera.org:8080/#/c/19699/28/tests/query_test/test_chars.py@39 PS28, Line 39: a flake8: W504 line break after binary operator http://gerrit.cloudera.org:8080/#/c/19699/28/tests/query_test/test_chars.py@83 PS28, Line 83: a flake8: W504 line break after binary operator -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 28 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 24 Aug 2023 11:05:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 21: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:54:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10120: Add required fields for TGetInfoResp when error.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20344 ) Change subject: IMPALA-10120: Add required fields for TGetInfoResp when error. .. Patch Set 2: (1 comment) Thanks for digging into this! Could you add a test so it won't break in the future? E.g. running the following command beeline -u "jdbc:hive2://localhost:21050/default;auth=noSasl" -e queries Queries can be "SHOW TABLES" plus some SELECT/CREATE/INSERT/DROP statements. We can have a test like tests/shell/test_shell_commandline.py, e.g. test_beeline.py. Using codes similar to https://github.com/apache/impala/blob/4b62812995ce380f2dca038bac017432c6c5d14f/tests/common/impala_test_suite.py#L1030-L1045 http://gerrit.cloudera.org:8080/#/c/20344/2/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/20344/2/be/src/service/impala-hs2-server.cc@479 PS2, Line 479: return_val.infoValue.__set_stringValue(""); Do we need this for all usages of HS2_RETURN_ERROR() ? -- To view, visit http://gerrit.cloudera.org:8080/20344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib42bb82735fb4a8e6911b6a19adb8bd84973300b Gerrit-Change-Number: 20344 Gerrit-PatchSet: 2 Gerrit-Owner: Xiang Yang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:52:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9629/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:55:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13829/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:29:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. They can be used both as aggregate and analytic functions. regr_slope() takes two arguments of numeric type and returns the slope of the line. regr_intercept() takes two arguments of numeric type and returns the y-intercept of the regression line. regr_r2() takes two arguments of numeric type and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. Testing: The functions are extensively tested and cross-checked with Hive. The tests can be found in aggregation.test. Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 988 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/19569/21 -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20394 ) Change subject: IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13828/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc Gerrit-Change-Number: 20394 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 24 Aug 2023 08:06:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11957: Implement Regression functions: regr slope(), regr intercept() and regr r2()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 ) Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/21/be/src/exprs/aggregate-functions-ir.cc@298 PS21, Line 298: // https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9 line too long (151 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 08:05:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12366: Use 2GB as the default for thrift rpc max message size
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20394 Change subject: IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size .. IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size Thrift 0.16 implemented a limit on the max message size. In IMPALA-11669, we added the thrift_rpc_max_message_size parameter and set the default size to 1GB. Some existing clusters have needed to tune this parameter higher because their workloads use message sizes larger than 1GB (e.g. for metadata updates). Historically, Impala has been able to send and receive 2GB messages, so this changes the default value for thrift_rpc_max_message_size to 2GB (INT_MAX). This can be reduced in future when Impala can guarantee that messages work properly when split up into smaller batches. TestGracefulShutdown::test_shutdown_idle started failing with this change, because it is producing a different error message for one of the negative tests. ClientRequestState::ExecShutdownRequest() appends some extra explanation when it sees a "Network error" KRPC error, and the test expects that extra explanation. This modifies ClientRequestState::ExecShutdownRequest() to provide the extra explanation for the new error ("Timed out") as well. Testing: - Ran GVO Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc --- M be/src/rpc/thrift-util.cc M be/src/service/client-request-state.cc 2 files changed, 10 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20394/2 -- To view, visit http://gerrit.cloudera.org:8080/20394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc Gerrit-Change-Number: 20394 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins