[Impala-ASF-CR] [PROTOTYPE] IMPALA-10811: Additional tests for async DDL timing
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17959 ) Change subject: [PROTOTYPE] IMPALA-10811: Additional tests for async DDL timing .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9632/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7e6a58806126503f9bad05547872ec389065793 Gerrit-Change-Number: 17959 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 21 Oct 2021 03:08:14 + Gerrit-HasComments: No
[Impala-ASF-CR] [PROTOTYPE] IMPALA-10811: Additional tests for async DDL timing
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17959 Change subject: [PROTOTYPE] IMPALA-10811: Additional tests for async DDL timing .. [PROTOTYPE] IMPALA-10811: Additional tests for async DDL timing This adds additional tests that focus on verifying that the async DDL feature is behaving properly. Change-Id: Ic7e6a58806126503f9bad05547872ec389065793 --- M tests/metadata/test_ddl.py 1 file changed, 117 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17959/1 -- To view, visit http://gerrit.cloudera.org:8080/17959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic7e6a58806126503f9bad05547872ec389065793 Gerrit-Change-Number: 17959 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-10972: Fixup test behavior to work with HIVE-24920
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17956 ) Change subject: IMPALA-10972: Fixup test behavior to work with HIVE-24920 .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7551/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idfe5468a0b9e1025ec7a0ad3cdce4793f35ca7ba Gerrit-Change-Number: 17956 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 21 Oct 2021 02:49:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 31: (3 comments) Ok, this is getting close. I have a couple small code comments. I wrote some additional tests here: http://gerrit.cloudera.org:8080/17959 You are welcome to incorporate them into your change, or I can merge them separately. http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@695 PS31, Line 695: Async Nit: What do you think about dropping Async from this name? http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@702 PS31, Line 702: // Indirectly check if running in thread async_exec_thread_. : if (!exec_dml_async) { When I was reading this code, I got confused by the variable name. The code inside this if is for when we are doing async DML. Should this variable be called "exec_dml_sync"? http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@706 PS31, Line 706: DebugActionNoFail( : exec_request_->query_options, "CRS_DELAY_BEFORE_CATALOG_OP_EXEC"); When adding a delay, it usually lines up with some piece of code that can take a long time. This pause is standing on its own a bit away from the actual catalog op that could take a long time, but Impala won't actually pause in this location. Can we put it right next to the catalog op? -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 21 Oct 2021 02:47:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10972: Fixup test behavior to work with HIVE-24920
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17956 ) Change subject: IMPALA-10972: Fixup test behavior to work with HIVE-24920 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9631/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idfe5468a0b9e1025ec7a0ad3cdce4793f35ca7ba Gerrit-Change-Number: 17956 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 20 Oct 2021 23:48:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10972: Fixup test behavior to work with HIVE-24920
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17956 Change subject: IMPALA-10972: Fixup test behavior to work with HIVE-24920 .. IMPALA-10972: Fixup test behavior to work with HIVE-24920 With HIVE-24920, the HMS runs in a mode that prohibits creating a managed directory where the default location already exists. Some Impala test helper functions copied files into a particular location and then created a table without specifying the location in the create statement. This is no longer possible. This changes the helper functions in test/common/file_utils.py to create the table before pulling files in. Tests: - Ran a core job against a Hive with HIVE-24920 and verified that the failures due to changes in behavior are gone. - Ran a core job against the current Hive Change-Id: Idfe5468a0b9e1025ec7a0ad3cdce4793f35ca7ba --- M tests/common/file_utils.py 1 file changed, 16 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/17956/1 -- To view, visit http://gerrit.cloudera.org:8080/17956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idfe5468a0b9e1025ec7a0ad3cdce4793f35ca7ba Gerrit-Change-Number: 17956 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 6: (2 comments) I can bump it up to +2 after the minor suggestions below are resolved. Since the suggested changes are only in commit msg, I don't believe we need to have a full precommit run again. http://gerrit.cloudera.org:8080/#/c/17919/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17919/6//COMMIT_MSG@7 PS6, Line 7: Minor refactoring in alter table ddl op in : CatalogOpExecutor please keep this to one line as per convention. Suggest you to just title is "Refactor alter table ddl operation" or something similar. http://gerrit.cloudera.org:8080/#/c/17919/6//COMMIT_MSG@19 PS6, Line 19: s Can you also mention about the additional bug fix in the addToDeleteEventLog method? -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 6 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 20 Oct 2021 22:55:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 31: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9630/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Wed, 20 Oct 2021 22:05:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 7: (11 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap> txnToWriteIds_ = : new ConcurrentHashMap<>(); > Thanks for the clarification. "the new HMS API getAllWriteEventInfo only re getAllWriteEventInfo just return the data stored in the table TXN_WRITE_NOTIFICATION_LOG. AFAIK, HS2 calls add_write_notification_log that inserts records into TXN_WRITE_NOTIFICATION_LOG only for DML for transactional tables. I tried few queries locally like "drop constraint", and they advance write id but don't add write notification log. I tried to reduce the memory footprint here by saving write ids for transactional partitioned tables only. Besides, this map's size is just proportional to the simultaneous open transactions. Despite I don't have any real data points, we might not have a huge number of simultaneous "open" transactions? http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap> txnToWriteIds_ = : new ConcurrentHashMap<>(); > @Yu-Wen: In addition to what Vihang asked, how would we handle the followin @Sourabh Good question. Since I don't see a way to retrieve back the missing write id, we might accept that this write id remains open. When next time a request with writeIdList that has this write id as committed, we will reload the whole table because the writeIdList of the request is considered more recent. In some sense, the table cache is considered stale when the write id is not marked committed. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891 PS6, Line 2891: case > do we need a default: clause which throws a exception? Done http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java@142 PS6, Line 142: volatile > not sure why we need this? @Vihang I call getCreateEventId() in AllocWriteIdEvent without acquiring lock. Is there any chance createEventId will be set after the table is loaded? If not, we don't need this. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2066 PS6, Line 2066: catalog_.removeWriteIds(txnId_); > This line must be in finally block otherwise we are leaking memory in case Thank you for catching this. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057 PS7, Line 2057: commitTxnMessage_.addWriteEventInfo(writeEventInfoList); > Why are we modifying commitTxnMesage? Can't we get all the required info fr @Sourabh I actually imitated the code from hive repl: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/CommitTxnHandler.java#L166. The upside is that I don't have to parse table and partition objects. It is done by CommitTxnMessage. As I can see from the hive code, it seems like this function is used like this by design. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080 PS7, Line 2080: Preconditions.checkNotNull(commitTxnMessage_.getPartitions()); > Why are we checking for non null partitions? Wouldn't unpartitioned table h As long as we have called addWriteEventInfo, this would be empty list even for unpartitioned table. So, this is just to check we have added write event info. I can change the check to other variables to avoid confusion. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 31: (2 comments) Address all review comments and add the query option enable_async_ddl_execution. http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py@903 PS29, Line 903: : # IMPALA-10811: RPC to submit query getting stuck for AWS NLB forever : # Test HS2, Beeswax and HS2-HTTP three clients. : class TestAsyncDDL(TestDdlBase): : @classmethod : def get_workload(self): : return 'functional-query' : : @classmethod : def add_test_dimensions(cls): : super(TestAsyncDDL, cls).add_test_dimensions() : cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension()) : cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension( : sync_ddl=[0], disable_codegen_options=[False])) : : def test_async_ddl(self, vector, unique_database): > As with the JDBC test, the way I'm thinking about this test is whether this Removed CTAS test from async_ddl and replaced it with the state transition test for three clients here. Retain only alter table recover partition test in async_ddl. Done. http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py File tests/metadata/test_metadata_query_statements.py: http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py@153 PS29, Line 153: @pytest.mark.execute_serially # due to data src setup/teardown : @SkipIfCatalogV2.data_sources_unsupported() : def test_show_data_sources(self, vector): : try: : self.__create_data_sources() : self.run_test_case('QueryTest/show-data-sources', vector) : finally: : self.__drop_data_sources() : : def __drop_data_sources(self): : for name in self.TEST_DATA_SR > The 21050 port is the HS2 port. Since we claim that the protocol does not change with the patch, this is a functional test for JDBC client. We have similar tests for HS2, BW and HS2-http. By looking at JDBC driver setup for other vendors, it seems one can set the port in the URL connection string. See one example here https://www.baeldung.com/java-jdbc-url-format. In the invocation of Impala JDBC client, the port number is set to 21050. So it is very likely (without real code to look at) that the Impala JDBC client is capable of talking to impala hs2 server. ('cmd=', '/home/qchen/Impala.07202021/bin/run-jdbc-client.sh -i "localhost:21050" -t NOSASL -q "drop table if exists test_async_ddl_with_JD BC_bc95ba92.foo;"') -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Wed, 20 Oct 2021 21:44:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Qifan Chen has uploaded a new patch set (#31). ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever This patch addresses Impala client hang due to AWS network load balancer timeout which is fixed at 350s. When some long DDL operations are executing and the timeout happens, AWS silently drops the connection and the Impala client enters the hang state. The fix maintains the current TCLIService protocol between the client and Impala server and is applicable to the following Impala clients which issue thrift RPC ExecuteStatement() followed by repeated call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend. 1. HS2 2. Beeswax 3. Impyla 4. HUE In the fix, the backend method ClientRequestState::ExecDdlRequest() can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest() which executes most of the DDLs asynchronously. This thread is waited for in the wait thread 'wait_thread_'. Since the wait thread also runs asynchronously, the execution of the DDLs will not cause a wait on the Impala client. Thus the Impala client can keep checking its execution status via GetOperationStatus() without long waiting, say more than 350s. As an optimization, the above asynchronous mode is not applied to the execution of certain DDLs that run very low risks of long execution. 1. Operations that do not access catalog service; 2. COMPUTE STATS as the stats computation queries already run asynchronously. External behavior changes: 1. A new field with name "DDL execution mode:" is added to the summary section in the runtime profile, next to "DDL Type". This field takes either 'asynchronous' or 'synchronous' as value. 2. A new query option 'enable_async_ddl_execution', default to true, is added. It can be set to false to turn off the patch. Limitations: This patch does not handle potential AWS NLB-type time out for LOAD DATA (IMPALA-10967). Testing: 1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and JDBC clients 2. Ran core tests successfully Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test M tests/common/impala_test_suite.py M tests/metadata/test_ddl.py 9 files changed, 255 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/31 -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 6 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 20 Oct 2021 17:43:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 20 Oct 2021 17:33:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap> txnToWriteIds_ = : new ConcurrentHashMap<>(); > Thanks for the clarification. "the new HMS API getAllWriteEventInfo only re @Yu-Wen: In addition to what Vihang asked, how would we handle the following scenario: Suppose the following HMS events are generated for a ddl on a partition: 1. Open txn 2. Allocate write id 3. perform ddl op 4. Commit txn Now if event processor is started and say it starts processing events from #3. Assuming that write id generated in #2 is not returned in HMS API getAllWriteEventInfo, does this mean that event processor would skip adding writeId from #2 to committed writeIds list? If so, is this the right/desired behavior? http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057 PS7, Line 2057: commitTxnMessage_.addWriteEventInfo(writeEventInfoList); Why are we modifying commitTxnMesage? Can't we get all the required info from writeEventInfoList? http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080 PS7, Line 2080: Preconditions.checkNotNull(commitTxnMessage_.getPartitions()); Why are we checking for non null partitions? Wouldn't unpartitioned table have null partitions? http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4160 PS7, Line 4160: List partsFromEvent, boolean loadFromEvent, boolean loadFileMetadata, Looks like if loadFromEvent is true, then in HdfsTable we avoid making HMS api getPartitionsByNames(). I don't think adding loadFromEvent flag is the right way to do this. We should have clear apis in catalogOpExecutor, HdfsTable etc to reload partitions based on partition names and partition objects. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4233 PS7, Line 4233: public int addCommittedWriteIdsAndReloadPartitionsIfExist(long eventId, String dbName, This api is not making much sense to me here. More so since it is public. Should we move this inside CommitTxnEvent? Some parts like getting table from cache can be moved to a common place in catalogOpExecutor. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 7 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 20 Oct 2021 17:27:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 ) Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever .. Patch Set 30: After thinking about it, I agree with you that we may want a query option to control this behavior. That will make it easier to ship this without worrying about unknown client behavior. If it would help, I can give a quick idea of how I would implement it. -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 30 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Wed, 20 Oct 2021 16:52:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10955: test time travel is flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17928 ) Change subject: IMPALA-10955: test_time_travel is flaky .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia270781dfdc4101b0d961db5aa20fc1678cdaaa8 Gerrit-Change-Number: 17928 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Oct 2021 14:45:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10955: test time travel is flaky
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17928 ) Change subject: IMPALA-10955: test_time_travel is flaky .. IMPALA-10955: test_time_travel is flaky test_time_travel is flaky. It's probably because one second isn't always enough for Impala to refresh the Iceberg table and return the results to the client when the system is busy. Instead of one second now we are using 5 seconds. Change-Id: Ia270781dfdc4101b0d961db5aa20fc1678cdaaa8 Reviewed-on: http://gerrit.cloudera.org:8080/17928 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/query_test/test_iceberg.py 1 file changed, 3 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia270781dfdc4101b0d961db5aa20fc1678cdaaa8 Gerrit-Change-Number: 17928 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] WIP IMPALA-10569: Determine Iceberg file format from Icebert metadata
Tamas Mate has abandoned this change. ( http://gerrit.cloudera.org:8080/17918 ) Change subject: WIP IMPALA-10569: Determine Iceberg file format from Icebert metadata .. Abandoned The FlatBuffer FD change will be done as part of another change. -- To view, visit http://gerrit.cloudera.org:8080/17918 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I74a583ffe91694a647a37846cd58a99e37d6ce72 Gerrit-Change-Number: 17918 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. .. Patch Set 8: (5 comments) Left some minor comments. When adding tests, I think it'd be useful to measure code coverage: buildall.sh -notests -codecoverage # To generate reports: bin/coverage_helper.sh http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/hdfs-columnar-scanner.cc File be/src/exec/hdfs-columnar-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/hdfs-columnar-scanner.cc@110 PS8, Line 110: int HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret( nit: unintended new line? http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/parquet/parquet-column-readers.h File be/src/exec/parquet/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/parquet/parquet-column-readers.h@509 PS8, Line 509: remaining Please add a comment about 'remaining' http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/parquet/parquet-column-readers.cc@1282 PS8, Line 1282: & Output parameters should be pointers. http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/scratch-tuple-batch.h File be/src/exec/scratch-tuple-batch.h: http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/scratch-tuple-batch.h@74 PS8, Line 74: boost::scoped_array selected_rows; I wonder if using vector would be better since it's uses a space-efficient representation, i.e. 1 bool = 1 bit. http://gerrit.cloudera.org:8080/#/c/17860/8/be/src/exec/scratch-tuple-batch.h@177 PS8, Line 177: ScratchMicroBatch* batches nit: usually we put output parameters at the end of the param list -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 8 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Oct 2021 13:22:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 6: Please share your feedback on this patch. -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 6 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 20 Oct 2021 11:21:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7550/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 6 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 20 Oct 2021 11:19:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7549/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 20 Oct 2021 11:15:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17919 ) Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9629/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 20 Oct 2021 10:29:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17919 to look at the new patch set (#5). Change subject: IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor .. IMPALA-10975: Minor refactoring in alter table ddl op in CatalogOpExecutor For almost all alter table DDL operations in catalogOpExecutor, a table is added to catalog update at a common place in the end if reloadMetadata is true. However for few sub ddl operations like ADD and DROP partitions, the table to update catalog is performed locally. This patch is to refactor addTableToCatalogUpdate() and call it at one place for all the sub ddls. After this patch, a table (after alter table ddl) is added to catalog update if its old catalog version does not match the current catalog version. Testing: Relying on existing tests since it is a minor refactoring. Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 1 file changed, 36 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/17919/5 -- To view, visit http://gerrit.cloudera.org:8080/17919 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbadcab68b4599ad18b681b1284052a47b74d802 Gerrit-Change-Number: 17919 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10955: test time travel is flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17928 ) Change subject: IMPALA-10955: test_time_travel is flaky .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7548/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia270781dfdc4101b0d961db5aa20fc1678cdaaa8 Gerrit-Change-Number: 17928 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Oct 2021 08:36:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10955: test time travel is flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17928 ) Change subject: IMPALA-10955: test_time_travel is flaky .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia270781dfdc4101b0d961db5aa20fc1678cdaaa8 Gerrit-Change-Number: 17928 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Oct 2021 08:36:34 + Gerrit-HasComments: No