[Impala-ASF-CR] IMPALA-10967 Load data should handle AWS NLB-type timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17955 ) Change subject: IMPALA-10967 Load data should handle AWS NLB-type timeout .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9645/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c2437e9894510204303ec07710cad60102c8821 Gerrit-Change-Number: 17955 Gerrit-PatchSet: 4 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Sat, 23 Oct 2021 01:19:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10967 Load data should handle AWS NLB-type timeout
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17955 ) Change subject: IMPALA-10967 Load data should handle AWS NLB-type timeout .. Patch Set 4: Rebase. -- To view, visit http://gerrit.cloudera.org:8080/17955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c2437e9894510204303ec07710cad60102c8821 Gerrit-Change-Number: 17955 Gerrit-PatchSet: 4 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Sat, 23 Oct 2021 00:57:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10967 Load data should handle AWS NLB-type timeout
Qifan Chen has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17955 ) Change subject: IMPALA-10967 Load data should handle AWS NLB-type timeout .. IMPALA-10967 Load data should handle AWS NLB-type timeout This patch addresses Impala client hang due to AWS network load balancer timeout which is fixed at 350s. When some long data loading operations are executing and the timeout happens, AWS silently drops the connection and the Impala client enters the hang state. The fix maintains the current TCLIService protocol between the client and Impala server and utilizes a separate thread to run the data loading and metadata refresh operation. Since this thread is waited for in a wait thread which runs asynchronously, the execution of the entire operation will not cause a wait on the Impala client. The Impala client can check the status of the operation via repeated GetOperationStatus() call. Testing: 1. Added a new test in load.test to verify that the asynchronous execution in BE keeps the session live; 2. Ran core tests successfully. Change-Id: I8c2437e9894510204303ec07710cad60102c8821 --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M testdata/workloads/functional-query/queries/QueryTest/load.test 3 files changed, 88 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17955/4 -- To view, visit http://gerrit.cloudera.org:8080/17955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c2437e9894510204303ec07710cad60102c8821 Gerrit-Change-Number: 17955 Gerrit-PatchSet: 4 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever This patch addresses Impala client hang due to AWS network load balancer timeout which is fixed at 350s. When some long DDL operations are executing and the timeout happens, AWS silently drops the connection and the Impala client enters the hang state. The fix maintains the current TCLIService protocol between the client and Impala server and is applicable to the following Impala clients which issue thrift RPC ExecuteStatement() followed by repeated call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend. 1. HS2 2. Beeswax 3. Impyla 4. HUE In the fix, the backend method ClientRequestState::ExecDdlRequest() can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl() which executes most of the DDLs asynchronously. This thread is waited for in the wait thread 'wait_thread_'. Since the wait thread also runs asynchronously, the execution of the DDLs will not cause a wait on the Impala client. Thus the Impala client can keep checking its execution status via GetOperationStatus() without long waiting, say more than 350s. As an optimization, the above asynchronous mode is not applied to the execution of certain DDLs that run very low risks of long execution. 1. Operations that do not access catalog service; 2. COMPUTE STATS as the stats computation queries already run asynchronously. External behavior change: 1. A new field with name "DDL execution mode:" is added to the summary section in the runtime profile, next to "DDL Type". This field takes either 'asynchronous' or 'synchronous' as value. 2. A new query option 'enable_async_ddl_execution', default to true, is added. It can be set to false to turn off the patch. Limitations: This patch does not handle potential AWS NLB-type time out for LOAD DATA (IMPALA-10967). Testing: 1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and JDBC clients. 2. Ran core tests successfully. Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Reviewed-on: http://gerrit.cloudera.org:8080/17872 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test M tests/common/impala_test_suite.py M tests/metadata/test_ddl.py 9 files changed, 386 insertions(+), 26 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 38 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 37: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 37 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Fri, 22 Oct 2021 23:42:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 20: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 20 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 20:29:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 20: (20 comments) http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7 PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints > ping @Vihang: I already have this in my to-do list. Will write a detailed commit message. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672 PS12, Line 2672: lse; > I am not sure I fully understand. If this change is not needed for the patc I understand and as discussed over call, I will remove this method http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@288 PS19, Line 288: syncToLatestEventFactory_ > Based on my understanding this field here is unnecessary and is complicatin @Vihang: We do need to pass event factory object to syncToLatestEventId() in metastoreEventProcessor since it is a static method. However I agree that we should not create a new event factory and instead modify isSelfEvent() to accomodate sync to latest event id flag. One issue that I see is - if event processing is disabled then MetastoreEventProcessor is not initialized and there would be no way to access eventFactory object from it. Few ways to solve this: 1. Decouple MetastoreEventProcessor and EventFactory creation. In JniCatalog, we can create a common event factory that would be set in EventProcessor as well as used elsewhere. Doing so, we need to make sure that the factory is thread safe. 2. Make MetastoreEventFactory singleton and use it from all the places. This would avoid JniCatalog route. I had tried this approach in the past and encountered some test failures. Didn't investigate the failures in depth but it appeared to be race conditions issues. I can take a shot at it again if the approach seems cleaner. Let me know your thoughts. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@316 PS19, Line 316: > this is unused and can be removed. Ack http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: > All the places where I see this variable getting accessed, I see that it is As discussed over call, we will keep it as volatile so that EventProcessor can check if it needs to process an event or not without acquiring readlock on db/table. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@134 PS19, Line 134: LOG.debug("createEventId_ for db: {} set to: {}", getName(), createEventId_); : if (lastSyncedEventId_ < eventId) { : setLastSyncedEventId(eventId); : } > Pls remove if not needed anymore. Sure. I had added this check earlier but then saw some failures in the tests related to ddls from Impala shell . For now, I will add a TODO comment and we can address it in a follow up jira. http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@153 PS19, Line 153: /** :* Creates a Db object with no tables based on the given TDatabase thrift struct. :*/ : public static Db fromTDatabase(TDatabase db) { : ret > pls remove if not needed. Same as previous comment http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186 PS12, Line 186: > Thanks for adding the comment. But similar to the Db.java's volatile keywor As discussed, we will keep the variable as volatile so that event processor can read it (to check if event should be skipped or not) w
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 37: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9644/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 37 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Fri, 22 Oct 2021 17:42:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Qifan Chen has uploaded a new patch set (#37). ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever This patch addresses Impala client hang due to AWS network load balancer timeout which is fixed at 350s. When some long DDL operations are executing and the timeout happens, AWS silently drops the connection and the Impala client enters the hang state. The fix maintains the current TCLIService protocol between the client and Impala server and is applicable to the following Impala clients which issue thrift RPC ExecuteStatement() followed by repeated call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend. 1. HS2 2. Beeswax 3. Impyla 4. HUE In the fix, the backend method ClientRequestState::ExecDdlRequest() can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl() which executes most of the DDLs asynchronously. This thread is waited for in the wait thread 'wait_thread_'. Since the wait thread also runs asynchronously, the execution of the DDLs will not cause a wait on the Impala client. Thus the Impala client can keep checking its execution status via GetOperationStatus() without long waiting, say more than 350s. As an optimization, the above asynchronous mode is not applied to the execution of certain DDLs that run very low risks of long execution. 1. Operations that do not access catalog service; 2. COMPUTE STATS as the stats computation queries already run asynchronously. External behavior change: 1. A new field with name "DDL execution mode:" is added to the summary section in the runtime profile, next to "DDL Type". This field takes either 'asynchronous' or 'synchronous' as value. 2. A new query option 'enable_async_ddl_execution', default to true, is added. It can be set to false to turn off the patch. Limitations: This patch does not handle potential AWS NLB-type time out for LOAD DATA (IMPALA-10967). Testing: 1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and JDBC clients. 2. Ran core tests successfully. Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test M tests/common/impala_test_suite.py M tests/metadata/test_ddl.py 9 files changed, 386 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/37 -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 37 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 37: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7557/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 37 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Fri, 22 Oct 2021 17:21:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17960 ) Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323 PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false; > In this patch we set is_bound_by_parttion_columns for the computed partitio I got it. Thanks for the explanation. So in this case, maybe we can add a new field to TRuntimeFilterTargetDesc: is_data_in_data_file. At line 678 in this file, we do the test as follows. if (IsBoundByPartitionColumn(idx) && !IsDataInDataFile(idx)) { continue; } 93 94 // Specification of a runtime filter target. 95 struct TRuntimeFilterTargetDesc { 96 // Target node id 97 1: Types.TPlanNodeId node_id 98 99 // Expr on which the filter is applied 100 2: required Exprs.TExpr target_expr 101 102 // Indicates if 'target_expr' is bound only by partition columns 103 3: required bool is_bound_by_partition_columns 104 105 // Slot ids on which 'target_expr' is bound on 106 4: required list target_expr_slotids 107 108 // Indicates if this target is on the same fragment as the join that 109 // produced the runtime filter 110 5: required bool is_local_target 111 112 // If the target node is a Kudu scan node, the name, in the case it appears in Kudu, and 113 // type of the targeted column. 114 6: optional string kudu_col_name 115 7: optional Types.TColumnType kudu_col_type; 116 117 // The low and high value as seen in the column stats of the targeted column. 118 8: optional Data.TColumnValue low_value 119 9: optional Data.TColumnValue high_value 120 121 // Indicates if the low and high value in column stats are present 122 10: optional bool is_min_max_value_present 123 } http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678 PS2, Line 678: && IsSimplePartitionedTable() > For simple partitioned tables we don't want to evaluate the filters at the make sense. Done -- To view, visit http://gerrit.cloudera.org:8080/17960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac Gerrit-Change-Number: 17960 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 16:45:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057 PS7, Line 2057: protected SelfEventContext getSelfEventContext() { > @Sourabh Makes sense. Thanks for the pointers. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080 PS7, Line 2080: NotificationEvent event) { > As long as we have called addWriteEventInfo, this would be empty list even got it ! -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 16:20:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17960 ) Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323 PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false; > Ideally, is_computed_column can be a new boolean field to TColumn, together In this patch we set is_bound_by_parttion_columns for the computed partition columns. http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678 PS2, Line 678: && IsSimplePartitionedTable() > Since a computed partition column is a partition column, I wonder if there For simple partitioned tables we don't want to evaluate the filters at the row group level as they are not stored in the parquet files. Hence we 'continue' here. But for computed partitions we want to evaluate the filters since the columns are stored in the data files. -- To view, visit http://gerrit.cloudera.org:8080/17960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac Gerrit-Change-Number: 17960 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 15:25:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. .. Patch Set 10: (10 comments) Looks great! On testing, I wonder if we can add a counter on # of rows (or amount of data) not surviving the materialization. This will be useful to safe guard the feature and demonstrate its usefulness. http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch-test.cc File be/src/exec/scratch-tuple-batch-test.cc: http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch-test.cc@67 PS10, Line 67: // set every 16th row as selected. I wonder if we can add two more tests for the following situations. 1. Clustered: over 1024 values, 200 consecutive are true and the rest is false; 2. interleaved: over 1024 values, randomly set 10%, 30%, and 70% to true. http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h File be/src/exec/scratch-tuple-batch.h: http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@29 PS10, Line 29: ScratchMicroBatch May need a cstr to properly init these fields. http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@171 PS10, Line 171: /// Consecutive bits set are used to create ranges. Ranges that differ by less than nit (or micro batches). http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@172 PS10, Line 172: E.g., for ranges 1-8, 11-20, 35-100 derived : /// from 'selected_rows' and 'skip_length' as 10, first two ranges would be merged : /// into 1-20 as they differ by 3 (11 - 8) which is less than 10 ('skip_length'). This is wonderful. http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@176 PS10, Line 176: atleast nit. http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@178 PS10, Line 178: range nit. batch_idx may be a better name in this method. http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@203 PS10, Line 203: DCHECK(start != -1) << "Atleast one of the 'scratch_batch_->selected_rows'" : << "should be true"; nit. An alternative is the following, which is more robust. if (start == -1) { return 0; } http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/service/query-options.h@50 PS10, Line 50: PARQUET_MATERIALIZATION_THRESHOLD nit: PARQUET_LATE_MATERIALIZATION_THRESHOLD? http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/ImpalaService.thrift@701 PS10, Line 701: // of columns in parquet nit. -1 to turn off the feature. http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/Query.thrift@554 PS10, Line 554: // of columns in parquet nit. -1 to turn off the feature. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 10 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 14:43:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/17860/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17860/10//COMMIT_MSG@19 PS10, Line 19: that nit: than http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/parquet/hdfs-parquet-scanner.cc@2223 PS10, Line 2223: c. Could you please explain where do we filter out the rows in the merged micro batches? -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 10 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 14:30:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9643/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 20 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 14:27:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 20: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7556/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 20 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 14:16:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 20: (2 comments) http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2402 PS20, Line 2402: batchEvents = eventFactory.createBatchEvents(mockEvents, eventsProcessor_.getMetrics()); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java: http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@85 PS20, Line 85: private static boolean flagEnableCatalogCache ,flagInvalidateCache, flagSyncToLatestEventId; line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 20 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 14:06:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Hello Vihang Karajgaonkar, kis...@cloudera.com, Yu-Wen Lai, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17859 to look at the new patch set (#20). Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java A fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M tests/custom_cluster/test_metastore_service.py 26 files changed, 3,362 insertions(+), 282 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/20 -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 20 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17960 ) Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323 PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false; > Yeah, good question how could we make it a bit more elegant. Should I add a Ideally, is_computed_column can be a new boolean field to TColumn, together with an expression on how the computation is done, such EXTRACT(DAY FROM timestamp_col). For this patch, looks like we may not need that at all if we can set is_bound_by_partition_columns properly in FE even for the computed column. 120 bool IsBoundByPartitionColumn(int plan_id) const { 121 int target_ndx = filter_desc().planid_to_target_ndx.at(plan_id); 122 return filter_desc().targets[target_ndx].is_bound_by_partition_columns; 123 } http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678 PS2, Line 678: && IsSimplePartitionedTable() Since a computed partition column is a partition column, I wonder if there is a need to make a test here? -- To view, visit http://gerrit.cloudera.org:8080/17960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac Gerrit-Change-Number: 17960 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 13:41:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7555/ -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 13:33:22 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP]: Initial commit to sync db/table to latest hms event id after executing ddls from Impala shell
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17904 to look at the new patch set (#3). Change subject: [WIP]: Initial commit to sync db/table to latest hms event id after executing ddls from Impala shell .. [WIP]: Initial commit to sync db/table to latest hms event id after executing ddls from Impala shell Change-Id: I8af6b368f70a4cb587d2b961c059d55237b25c6c --- M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 3 files changed, 353 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17904/3 -- To view, visit http://gerrit.cloudera.org:8080/17904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8af6b368f70a4cb587d2b961c059d55237b25c6c Gerrit-Change-Number: 17904 Gerrit-PatchSet: 3 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17960 ) Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9642/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac Gerrit-Change-Number: 17960 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 10:14:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17960 ) Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions .. Patch Set 2: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323 PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false; > I wonder if FE is in the better position to make the decision. Yeah, good question how could we make it a bit more elegant. Should I add a field to THdfsTable, e.g. 'has_computed_partitions'? Maybe as long as we only have computed partitions + Parquet in Iceberg tables we can keep it as it is. Or do you have a better suggestion? http://gerrit.cloudera.org:8080/#/c/17960/1/fe/src/main/java/org/apache/impala/catalog/FeTable.java File fe/src/main/java/org/apache/impala/catalog/FeTable.java: http://gerrit.cloudera.org:8080/#/c/17960/1/fe/src/main/java/org/apache/impala/catalog/FeTable.java@116 PS1, Line 116: > nit. May rename it as isComputedPartitioColumn() as the computed column con Done -- To view, visit http://gerrit.cloudera.org:8080/17960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac Gerrit-Change-Number: 17960 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Oct 2021 09:53:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions
Hello Tamas Mate, Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17960 to look at the new patch set (#2). Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions .. IMPALA-10777: Enable min/max filtering for Iceberg partitions This patch enables min/max filters for Iceberg columns that participate in table partitioning. The min/max filters are evaluated at the Parquet row group level. This means that it is still slower than dynamic partition pruning (which doesn't even need to open the files), but much faster than no pruning at all. Performance I used the following query to measure perf on a scale 10 TPC-DS dataset: select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_color in ('orchid','chiffon','lace')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and d_moy = 1 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -8 The above query took the following times to execute: Regular Parquet table: 1.16s Iceberg table without min/max filters: 4.39s Iceberg table with min/max filters: 1.77s Testing: * added e2e test * planner test could not be added because Iceberg tables behave differently during planner tests (due to some hacks that needs refactoring) Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 6 files changed, 45 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17960/2 -- To view, visit http://gerrit.cloudera.org:8080/17960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac Gerrit-Change-Number: 17960 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 36: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7554/ -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 36 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Fri, 22 Oct 2021 08:06:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9641/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 07:41:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7555/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 22 Oct 2021 07:21:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 928 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/8 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai