[Impala-ASF-CR] IMPALA-11661: Added new api in MetastoreServiceHandler for find next compact2 method
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19140 ) Change subject: IMPALA-11661: Added new api in MetastoreServiceHandler for find_next_compact2 method .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/19140/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/19140/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2290 PS1, Line 2290: To follow the convention, we should use 4 space indent here. -- To view, visit http://gerrit.cloudera.org:8080/19140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f1663c16d2649c9c455e6dffde02894819b2761 Gerrit-Change-Number: 19140 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Sat, 15 Oct 2022 18:27:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala In this patch, we use TUpdateCatalogRequest to refresh metadata after 'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the code for 'INSERT' statements. It will fire an insert event just same as what we did for 'INSERT' statements. We also fix the inconsistent indentation in event_processor_utils.py. Testing: - Run existing test_load.py - Added test_load_data_from_impala() in test_event_processing.py Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/metadata/test_event_processing.py M tests/util/event_processor_utils.py 7 files changed, 194 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/6 -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 6 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/19052/4/tests/metadata/test_event_processing.py File tests/metadata/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/19052/4/tests/metadata/test_event_processing.py@408 PS4, Line 408: into table {1}.{2}".format(staging_dir, unique_database, tbl_nopart)) > I think we need to mark this test using @pytest.mark.execute_serially. Othe Thanks for pointing out this mark! -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 04 Oct 2022 00:31:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala In this patch, we use TUpdateCatalogRequest to refresh metadata after 'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the code for 'INSERT' statements. It will fire an insert event just same as what we did for 'INSERT' statements. We also fix the inconsistent indentation in event_processor_utils.py. Testing: - Run existing test_load.py - Added test_load_data_from_impala() in test_event_processing.py Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/metadata/test_event_processing.py M tests/util/event_processor_utils.py 7 files changed, 195 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/5 -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala In this patch, we use TUpdateCatalogRequest to refresh metadata after 'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the code for 'INSERT' statements. It will fire an insert event just same as what we did for 'INSERT' statements. We also fix the inconsistent indentation in event_processor_utils.py. Testing: - Run existing test_load.py - Added test_load_data_from_impala() in test_event_processing.py Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/metadata/test_event_processing.py M tests/util/event_processor_utils.py 7 files changed, 193 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/4 -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG@16 PS1, Line 16: - Run existing test_load.py > I see. Can we use the hive_client to fetch and verify the INSERT events dir Cool. Let me try that. http://gerrit.cloudera.org:8080/#/c/19052/3/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/19052/3/be/src/service/client-request-state.cc@2047 PS3, Line 2047: > nit: 2 spaces indent here Ack -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 03 Oct 2022 16:57:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. Patch Set 3: (3 comments) > Patch Set 1: > > (3 comments) > > This is a pretty nice fix! http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG@16 PS1, Line 16: - Run existing test_load.py > We also need tests to verify the INSERT events. Could you add some tests in I realized that replication cannot be used as a verification of insert event for external tables because hive replication for external tables relies on distcp instead of insert events. Given that LOAD DATA is only applicable to external tables, we need to use another way to verify the INSERT events. Therefore, I added a test and used number of skipped events as an implicit indicator. Let me know if you have better idea. http://gerrit.cloudera.org:8080/#/c/19052/1/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/19052/1/be/src/service/client-request-state.cc@806 PS1, Line 806: string for unpartitione > nit: Could you add a comment mentioning that the partition_name is an empty Done http://gerrit.cloudera.org:8080/#/c/19052/1/be/src/service/client-request-state.cc@809 PS1, Line 809: catalog_update.__set_sync_ddl(exec_request_->query_options.sync_ddl); : catalog_update.__set_header(GetCatalogServiceRequestHeader()); : catalog_update.target_table = exec_request_->load_data_request.table_name.table_name; : catalog_update.db_name = exec_request_->load_data_request.table_name.db_name; : catalog_update.is_overwrite = exec_request_->load_data_request.overwrite; : : const TNetworkAddress& address = > nit: these duplicate the code in ClientRequestState::ExecLoadDataRequestImp Done -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Sat, 01 Oct 2022 01:59:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/19052 ) Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala In this patch, we use TUpdateCatalogRequest to refresh metadata after 'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the code for 'INSERT' statements. It will fire an insert event just same as what we did for 'INSERT' statements. Testing: - Run existing test_load.py - Added test_load_data_from_impala() in test_event_processing.py Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/metadata/test_event_processing.py 6 files changed, 129 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/3 -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11627: Build Impala with cdw dependencies
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18994 ) Change subject: IMPALA-11627: Build Impala with cdw dependencies .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/18994/6/java/pom.xml File java/pom.xml: http://gerrit.cloudera.org:8080/#/c/18994/6/java/pom.xml@109 PS6, Line 109: nit: wrong indentation? -- To view, visit http://gerrit.cloudera.org:8080/18994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id379030f4b314e139c875584eee438b7416d89a4 Gerrit-Change-Number: 18994 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Sep 2022 23:03:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19052 Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala .. IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala In this patch, we use TUpdateCatalogRequest to refresh metadata after 'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the code for 'INSERT' statements. It will fire an insert event just same as what we did for 'INSERT' statements. Testing: - Run existing test_load.py Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc --- M be/src/service/client-request-state.cc M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/Frontend.java 4 files changed, 67 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/1 -- To view, visit http://gerrit.cloudera.org:8080/19052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc Gerrit-Change-Number: 19052 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11160: Ignore stale ALTER PARTITION events on transactional tables
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19020 ) Change subject: IMPALA-11160: Ignore stale ALTER_PARTITION events on transactional tables .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/19020/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19020/1//COMMIT_MSG@25 PS1, Line 25: Tests > The solution looks good, but one thing bugs me: shouldn't the original bug Thanks Quanlong for catching this. I agree with Csaba that we should add more tests around event processing. I just created a follow-up Jira IMPALA-11598. -- To view, visit http://gerrit.cloudera.org:8080/19020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5bb8cfc213093f3bbd0359c7084b277a3bd5264a Gerrit-Change-Number: 19020 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 20 Sep 2022 17:10:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11540: Add logs for ALTER TABLE events that trigger slow metadata reload
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18926 ) Change subject: IMPALA-11540: Add logs for ALTER_TABLE events that trigger slow metadata reload .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf344e6b423f88c9635ca8d61d53385b88ba4dce Gerrit-Change-Number: 18926 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Thu, 08 Sep 2022 15:38:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11540: Add logs for ALTER TABLE events that trigger slow metadata reload
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18926 ) Change subject: IMPALA-11540: Add logs for ALTER_TABLE events that trigger slow metadata reload .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf344e6b423f88c9635ca8d61d53385b88ba4dce Gerrit-Change-Number: 18926 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 30 Aug 2022 12:03:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9670: Fix unloaded views are shown as tables for GET TABLES requests
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18626 ) Change subject: IMPALA-9670: Fix unloaded views are shown as tables for GET_TABLES requests .. Patch Set 4: Code-Review+1 (2 comments) Thank you for helping me understand the context! http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1903 PS3, Line 1903: String tableName = tblMeta.getTableName().toLowerCase(); > Yeah, we can save the other toLowerCase() calls. As mentioned in our doc: Ack http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py@119 PS3, Line 119: IMPALAD_HOSTNAME_LIST[i] + ':' + > We calculate the hs2 ports and hs2-http ports based on the specified beeswa Ack -- To view, visit http://gerrit.cloudera.org:8080/18626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I528bb20272ebdd66a0118c30efc2b0566f2b0e2f Gerrit-Change-Number: 18626 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 20 Jun 2022 03:03:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9670: Fix unloaded views are shown as tables for GET TABLES requests
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18626 ) Change subject: IMPALA-9670: Fix unloaded views are shown as tables for GET_TABLES requests .. Patch Set 3: (4 comments) The patch looks good to me in general! Just left some question and minor comments. http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@46 PS3, Line 46: import org.apache.hadoop.hive.metastore.TableType; Maybe TableType can be removed as well? http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1903 PS3, Line 1903: String tableName = tblMeta.getTableName().toLowerCase(); Could you confirm if converting tableName to lower case is OK? I ask just because it was not converted to lower case before. If it's OK, we probably don't need to call toLowerCase next line. http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java File fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java: http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java@290 PS3, Line 290: case MATERIALIZED_VIEW: I figure I should ask the question here because sometimes we treat materialized view as table. Does it matter in the context of this patch? http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py@119 PS3, Line 119: str(IMPALAD_BEESWAX_PORT_LIST[i] - IMPALAD_BEESWAX_PORT + IMPALAD_HS2_PORT) I might missed something. Could you explain why we need to calculate port here? (same question for line 127) -- To view, visit http://gerrit.cloudera.org:8080/18626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I528bb20272ebdd66a0118c30efc2b0566f2b0e2f Gerrit-Change-Number: 18626 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 17 Jun 2022 23:47:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18324 ) Change subject: IMPALA-11181: Improving performance of compaction checking .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@705 PS3, Line 705: if (partNameToCompactionId.containsKey(entry.getKey().getName())) { : stalePartitions.add(entry.getKey()); : iter.remove(); > nit: Can we optimize this to the following case? Done http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@295 PS3, Line 295: > nit: Could you add a blank line before this? Done -- To view, visit http://gerrit.cloudera.org:8080/18324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c Gerrit-Change-Number: 18324 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 30 Mar 2022 16:20:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18324 ) Change subject: IMPALA-11181: Improving performance of compaction checking .. IMPALA-11181: Improving performance of compaction checking After HIVE-25753, we don't need to explicitly set all partitions' name to get the latest compaction id. Besides, we can also send the last compaction id over to HMS so that HMS will send back compaction info only if there are newer compactions. In this way, we can avoid unnecessary data transmitted between HMS and Catalogd. Testing: existing tests Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c --- M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java 3 files changed, 31 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/4 -- To view, visit http://gerrit.cloudera.org:8080/18324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c Gerrit-Change-Number: 18324 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18324 ) Change subject: IMPALA-11181: Improving performance of compaction checking .. IMPALA-11181: Improving performance of compaction checking After HIVE-25753, we don't need to explicitly set all partitions' name to get the latest compaction id. Besides, we can also send the last compaction id over to HMS so that HMS will send back compaction info only if there are newer compactions. In this way, we can avoid unnecessary data transmitted between HMS and Catalogd. Testing: existing tests Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c --- M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java 3 files changed, 30 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/3 -- To view, visit http://gerrit.cloudera.org:8080/18324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c Gerrit-Change-Number: 18324 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking
Yu-Wen Lai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18324 ) Change subject: IMPALA-11181: Improving performance of compaction checking .. IMPALA-11181: Improving performance of compaction checking After HIVE-25753, we don't need to explicitly set all partitions' name to get the latest compaction id. Besides, we can also send the last compaction id over to HMS so that HMS will send back compaction info only if there are newer compactions. In this way, we can avoid unnecessary data transmitted between HMS and Catalogd. Testing: existing tests Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c --- M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java 3 files changed, 29 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/2 -- To view, visit http://gerrit.cloudera.org:8080/18324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c Gerrit-Change-Number: 18324 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18324 Change subject: IMPALA-11181: Improving performance of compaction checking .. IMPALA-11181: Improving performance of compaction checking After HIVE-25753, we don't need to explicitly set all partitions' name to get the latest compaction id. Besides, we can also send the last compaction id over to HMS so that HMS will send back compaction info only if there are newer compactions. In this way, we can avoid unnecessary data transmitted between HMS and Catalogd. Testing: existing tests Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c --- M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java 3 files changed, 24 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/1 -- To view, visit http://gerrit.cloudera.org:8080/18324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c Gerrit-Change-Number: 18324 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489
Yu-Wen Lai has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18296 ) Change subject: Bump up CDP_BUILD_NUMBER to 23144489 .. Bump up CDP_BUILD_NUMBER to 23144489 This patch is to include HIVE-25753, which is needed to improve the performance of retrieving the latest committed compaction for a table. Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 --- M bin/impala-config.sh M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 3 files changed, 24 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/7 -- To view, visit http://gerrit.cloudera.org:8080/18296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 Gerrit-Change-Number: 18296 Gerrit-PatchSet: 7 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489
Yu-Wen Lai has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18296 ) Change subject: Bump up CDP_BUILD_NUMBER to 23144489 .. Bump up CDP_BUILD_NUMBER to 23144489 This patch is to include HIVE-25753, which is needed to improve the performance of retrieving the latest committed compaction for a table. Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 --- M bin/impala-config.sh M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 3 files changed, 18 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/6 -- To view, visit http://gerrit.cloudera.org:8080/18296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 Gerrit-Change-Number: 18296 Gerrit-PatchSet: 6 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18296 ) Change subject: Bump up CDP_BUILD_NUMBER to 23144489 .. Bump up CDP_BUILD_NUMBER to 23144489 This patch is to include HIVE-25753, which is needed to improve the performance of retrieving the latest committed compaction for a table. Besides, we also need to fix ClassCastException after SerializableTable is added to iceberg. Since BaseTable is always transformed to SerializableTable for serialization, we cannot restore BaseTable after deserializing it. Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 10 files changed, 68 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/5 -- To view, visit http://gerrit.cloudera.org:8080/18296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 Gerrit-Change-Number: 18296 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18296 ) Change subject: Bump up CDP_BUILD_NUMBER to 23144489 .. Bump up CDP_BUILD_NUMBER to 23144489 This patch is to include HIVE-25753, which is needed to improve the performance of retrieving the latest committed compaction for a table. Besides, we also need to fix ClassCastException after SerializableTable is added to iceberg. Since BaseTable is always transformed to SerializableTable for serialization, we cannot restore BaseTable after deserializing it. Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 10 files changed, 67 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/4 -- To view, visit http://gerrit.cloudera.org:8080/18296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 Gerrit-Change-Number: 18296 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18296 ) Change subject: Bump up CDP_BUILD_NUMBER to 23144489 .. Bump up CDP_BUILD_NUMBER to 23144489 This patch is to include HIVE-25753, which is needed to improve the performance of retrieving the latest committed compaction for a table. Besides, we also need to fix ClassCastException after SerializableTable is added to iceberg. Since BaseTable is always transformed to SerializableTable for serialization, we cannot restore BaseTable after deserializing it. Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 10 files changed, 63 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/3 -- To view, visit http://gerrit.cloudera.org:8080/18296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 Gerrit-Change-Number: 18296 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18296 Change subject: Bump up CDP_BUILD_NUMBER to 23144489 .. Bump up CDP_BUILD_NUMBER to 23144489 This patch is to include HIVE-25753, which is needed to improve the performance of retrieving the latest committed compaction for a table. Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 --- M bin/impala-config.sh 1 file changed, 12 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/2 -- To view, visit http://gerrit.cloudera.org:8080/18296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27 Gerrit-Change-Number: 18296 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18175 ) Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/18175/3/tests/metadata/test_event_processing.py File tests/metadata/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/18175/3/tests/metadata/test_event_processing.py@a39 PS3, Line 39: > @Yu-Wen: Please confirm the following: Yes, the test will fail intermittently without fine-grained table refreshing. The issue was that we previously refresh file metadata at alter partition event but while alter partition event was processed the transaction might not be committed yet. If it is committed, we could get new file metadata. Otherwise, we would still see stale file metadata. After my patch, we can now refresh file metadata at commit event. -- To view, visit http://gerrit.cloudera.org:8080/18175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d Gerrit-Change-Number: 18175 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Thu, 03 Feb 2022 18:06:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18175 ) Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata .. IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata If we insert data into an acid partitioned table from Hive, the generated events will be like open_txn -> alter_partition -> commit_txn. Previously we assumed the partition object with the alter_partition event has write id < current write id. However, that is not a valid assumption, the partition object is actually the write id allocated in this transaction. That means in commit_txn event, we will have a partition with write id equals to the write id of cached partition. So we need to modify the '<' condition to '<='. Tests: After IMPALA-10923, we now refresh file metadata while processing commit events. Therefore, we can add back the test disabled in IMPALA-9057. Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M tests/metadata/test_event_processing.py 2 files changed, 1 insertion(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/18175/3 -- To view, visit http://gerrit.cloudera.org:8080/18175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d Gerrit-Change-Number: 18175 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18175 ) Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/18175/1/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java File fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java: http://gerrit.cloudera.org:8080/#/c/18175/1/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java@255 PS1, Line 255: updateMinOpenWriteId(); > how is this related to this change? The minOpenWriteId is not used actually, so no harm as of now. I will remove this from the change and refactor this in another patch. -- To view, visit http://gerrit.cloudera.org:8080/18175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d Gerrit-Change-Number: 18175 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 28 Jan 2022 21:42:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18175 Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata .. IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata If we insert data into an acid partitioned table from Hive, the generated events will be like open_txn -> alter_partition -> commit_txn. Previously we assumed the partition object with the alter_partition event has write id < current write id. However, that is not a valid assumption, the partition object is actually the write id allocated in this transaction. That means in commit_txn event, we will have a partition with write id equals to the write id of cached partition. So we need to modify the '<' condition to '<='. Tests: Manually testing Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java 2 files changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/18175/1 -- To view, visit http://gerrit.cloudera.org:8080/18175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d Gerrit-Change-Number: 18175 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. Patch Set 5: There is one test failed at "Rows Processed" check in Dockerised-test but it seems similar to https://issues.apache.org/jira/browse/IMPALA-6004. It seems irrelevant to the patch. Other failures in "ubuntu-16.04-from-scratch" didn't exist in one previous build so they might be flasky. A previous run of the same patch passed at: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15371/. -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 30 Nov 2021 23:47:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction After compaction happened in Hive(HIVE ACID table), queries made in Impala possibly fail with a FileNotFoundException if files already removed by the Hive cleaner. In IMPALA-10801, catalogd checks the latest compaction id before serving metadata. However, coordinators don't take advantage of that. Coordinators have their own local cache, so we will have to do the same check for coordinators as well. Besides, we also need to attach writeIdList to requests that need to fetch file metadata. Since this checking brings additional overhead for queries, we introduce a flag auto_check_compaction and set it as false by default for now. We will find some other efficient ways to do compaction checking in the future. Tests: Added unit tests to CatalogdMetaProviderTest Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b --- M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 12 files changed, 356 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/5 -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc@348 PS3, Line 348: conduct > nit, conducted Ack http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc@349 PS3, Line 349: m > move to previous line? Ack http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc@349 PS3, Line 349: ala makes " : "additional RPCs to hive metastore for each table > suggest you to change it to more generic since end users may not understand Ack -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 30 Nov 2021 00:03:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction After compaction happened in Hive(HIVE ACID table), queries made in Impala possibly fail with a FileNotFoundException if files already removed by the Hive cleaner. In IMPALA-10801, catalogd checks the latest compaction id before serving metadata. However, coordinators don't take advantage of that. Coordinators have their own local cache, so we will have to do the same check for coordinators as well. Besides, we also need to attach writeIdList to requests that need to fetch file metadata. Since this checking brings additional overhead for queries, we introduce a flag auto_check_compaction and set it as false by default for now. We will find some other efficient ways to do compaction checking in the future. Tests: Added unit tests to CatalogdMetaProviderTest Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b --- M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 12 files changed, 337 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/4 -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction After compaction happened in Hive(HIVE ACID table), queries made in Impala possibly fail with a FileNotFoundException if files already removed by the Hive cleaner. In IMPALA-10801, catalogd checks the latest compaction id before serving metadata. However, coordinators don't take advantage of that. Coordinators have their own local cache, so we will have to do the same check for coordinators as well. Besides, we also need to attach writeIdList to requests that need to fetch file metadata. Since this checking brings additional overhead for queries, we introduce a flag auto_check_compaction and set it as false by default for now. We will find some other efficient ways to do compaction checking in the future. Tests: Added unit tests to CatalogdMetaProviderTest Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b --- M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 13 files changed, 335 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/3 -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG@10 PS2, Line 10: After compaction happened in Hive(HIVE ACID table), queries made in : Impala possibly fail with a FileNotFoundException if files already : removed by the Hive cleaner. > IIRC, Impala only open transactions for DDL/DML operations. Do you know how Thank Vihang and Quanlong for letting me know the problem. Impala does NOT open transactions for select queries so this approach doesn't work all the time... Hive has a config that can delay the cleaner some period of time but we don't know exactly how long we should extend. Given that this is time sensitive, I'm thinking we could make this feature optional for now. If this flag is set, say auto_check_compaction, let Impala open transactions for all the queries for ACID tables and do the compaction checking. Any thoughts? http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@898 PS2, Line 898: List stalePartitions = directProvider_.checkLatestCompaction( : refImpl.dbName_, refImpl.tableName_, refImpl, refToMeta); > I think this introduces several HMS RPCs per query (some queries may call t If we take the performance numbers on DWX as example, currently this API call takes 10 ~ 40 ms per table depending on the number of partitions. I will have a fix on the HMS side to solve an issue around this API that we need to pass all the partition names. That should make all the API execution time close to 10 ms. Even though we can make some improvement around this API, I understand this is still introduce the overhead that might not neglectable. It might be better to introduce this feature with a flag and the table property to skip this check as Quanlong suggested. -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 29 Nov 2021 02:56:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18043 ) Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction After compaction happened in Hive(HIVE ACID table), queries made in Impala possibly fail with a FileNotFoundException if files already removed by the Hive cleaner. In IMPALA-10801, catalogd checks the latest compaction id before serving metadata. However, coordinators don't take advantage of that. Coordinators have their own local cache, so we will have to do the same check for coordinators as well. Besides, we also need to attach writeIdList to requests that need to fetch file metadata. Tests: Added unit tests to CatalogdMetaProviderTest Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b --- M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 8 files changed, 308 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/2 -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18043 Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction .. IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction After compaction happened in Hive(HIVE ACID table), queries made in Impala possibly fail with a FileNotFoundException if files already removed by the Hive cleaner. In IMPALA-10801, catalogd checks the latest compaction id before serving metadata. However, coordinators don't take advantage of that. Coordinators have their own local cache, so we will have to do the same check for coordinators as well. Besides, we also need to attach writeIdList to requests that need to fetch file metadata. Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b --- M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 8 files changed, 303 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/1 -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/17858/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17858/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4341 PS15, Line 4341: // Aborted write id is not allowed. The write id can be committed if the table > ok. Please add a comment for it Done http://gerrit.cloudera.org:8080/#/c/17858/16/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17858/16/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2608 PS16, Line 2608: List FDafter = tbl.getPartitionsForNames( > From what I understand, this asserts that underlying file metatdata remaine Yes, you are right. The file descriptors will be reused if the files are not changed. I added a metric to check if file metadata is reloaded. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 17 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 16 Nov 2021 22:08:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables' addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for a partitioned ACID table(50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 17 files changed, 1,216 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/17 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 17 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables' addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for a partitioned ACID table(50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 17 files changed, 1,204 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/16 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 16 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 15: (6 comments) http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3560 PS13, Line 3560: LOG.debug("Not adding write ids to table {}.{} for event {} " + > nit: add more details in the log message like table name, event id being pr Done http://gerrit.cloudera.org:8080/#/c/17858/11/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java File fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java: http://gerrit.cloudera.org:8080/#/c/17858/11/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java@252 PS11, Line 252: exceptions.add(currentId); > Looked at the implementation of BitSet.get() and I think the following sequ Yes, that already works because BitSet by default returns false if it is not set. http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4368 PS13, Line 4368: throw new CatalogException( > nit: Would be good to add a log message with details about the rollback. Done http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2487 PS12, Line 2487: } finally { > nit: Original config should be restored in finally block Done http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2499 PS12, Line 2499: stubCfg.setHms_event_incremental_refresh_transactional_table(true); > nit: can include test name in the table name for example: test_abort_transa Done http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2818 PS12, Line 2818: } > Instead of creating a new method createTransactionalTable, we can enhance g I tried and verified that we need to set table params for creating transactional tables. Please see https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java#L174. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 15 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 10 Nov 2021 00:14:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables' addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for a partitioned ACID table(50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 17 files changed, 1,130 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/15 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 15 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables' addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for a partitioned ACID table(50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 17 files changed, 1,093 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/14 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 14 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables' addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for a partitioned ACID table(50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 17 files changed, 1,089 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/13 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 13 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables' addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for a partitioned ACID table(50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 17 files changed, 1,002 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/12 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 12 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 18 files changed, 1,000 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/11 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 11 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 17 files changed, 956 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/10 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 10 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 17 files changed, 938 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/9 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 9 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes for partitioned tables by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Conduct partition level refreshing for transactional tables addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 928 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/8 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 7: (11 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap> txnToWriteIds_ = : new ConcurrentHashMap<>(); > Thanks for the clarification. "the new HMS API getAllWriteEventInfo only re getAllWriteEventInfo just return the data stored in the table TXN_WRITE_NOTIFICATION_LOG. AFAIK, HS2 calls add_write_notification_log that inserts records into TXN_WRITE_NOTIFICATION_LOG only for DML for transactional tables. I tried few queries locally like "drop constraint", and they advance write id but don't add write notification log. I tried to reduce the memory footprint here by saving write ids for transactional partitioned tables only. Besides, this map's size is just proportional to the simultaneous open transactions. Despite I don't have any real data points, we might not have a huge number of simultaneous "open" transactions? http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap> txnToWriteIds_ = : new ConcurrentHashMap<>(); > @Yu-Wen: In addition to what Vihang asked, how would we handle the followin @Sourabh Good question. Since I don't see a way to retrieve back the missing write id, we might accept that this write id remains open. When next time a request with writeIdList that has this write id as committed, we will reload the whole table because the writeIdList of the request is considered more recent. In some sense, the table cache is considered stale when the write id is not marked committed. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891 PS6, Line 2891: case > do we need a default: clause which throws a exception? Done http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java@142 PS6, Line 142: volatile > not sure why we need this? @Vihang I call getCreateEventId() in AllocWriteIdEvent without acquiring lock. Is there any chance createEventId will be set after the table is loaded? If not, we don't need this. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2066 PS6, Line 2066: catalog_.removeWriteIds(txnId_); > This line must be in finally block otherwise we are leaking memory in case Thank you for catching this. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057 PS7, Line 2057: commitTxnMessage_.addWriteEventInfo(writeEventInfoList); > Why are we modifying commitTxnMesage? Can't we get all the required info fr @Sourabh I actually imitated the code from hive repl: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/CommitTxnHandler.java#L166. The upside is that I don't have to parse table and partition objects. It is done by CommitTxnMessage. As I can see from the hive code, it seems like this function is used like this by design. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080 PS7, Line 2080: Preconditions.checkNotNull(commitTxnMessage_.getPartitions()); > Why are we checking for non null partitions? Wouldn't unpartitioned table h As long as we have called addWriteEventInfo, this would be empty list even for unpartitioned table. So, this is just to check we have added write event info. I can change the check to other variables to avoid confusion. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 776 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/7 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 7 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 773 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/6 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 6 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 745 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/5 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Patch Set 4: (33 comments) http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9 PS1, Line 9: > +1 Done http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift File common/thrift/BackendGflags.thrift: http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift@219 PS1, Line 219: 97: required bool hms_event_incremental_refresh_transactional_table > What is really the reason of having a config for this? Is there a case wher The initial thought was just to toggle on/off the feature for easily doing experiments. From the perspective of users, they would like to turn this off only when this feature has problems. Therefore, the goal is to make this feature robust enough and then we can get rid of this flag. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap> txnToWriteIds_ = : new ConcurrentHashMap<>(); > Do we really need this? Based on my understanding when the ALLOC_WRITE_ID e The difficulty here is we have some DDLs advancing write id without changing data but the new HMS API getAllWriteEventInfo only return info for WRITE events. Let's say we have a DDL for table foo in txn 3 and this DDL allocates write id 3 for table foo. We can mark write id 3 as open for table foo when catalogd receives AllocWriteIdEvent. However, when it receives CommitTxnEvent for txn 3, we don't know write id 3 for table foo is associated with this transaction if we don't have a mapping table in catalog. We cannot reload writeIdList alone either for commitTxnEvent because chances are that there are other committed txn after this event and simply reloading wrietIdList make the table become inconsistent. Any thoughts or alternative approaches? Sorry that my previous patch was incomplete. The entry for a transaction should be deleted whenever the transaction is ended (committed or aborted). http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@777 PS1, Line 777: } : : public void removeWriteIds(Long txnId) { : Preconditions.checkNotNull(txnId); : txnToWriteIds_.remove(txnId); : } : } > this could be simplified as Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@788 PS1, Line 788: > do we need to check for existence of txnId? Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2604 PS1, Line 2604: return true; > line too long (96 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2610 PS1, Line 2610:*/ > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3578 PS1, Line 3578: > pls add java doc Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3584 PS1, Line 3584: { > this can throw a NPE since one of the conditins for this if is tbl==null. Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3597 PS1, Line 3597: > change to debug? Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3599 PS1, Line 3599: ibleForTesting > This preconditions check is unnecessary. Also, use use something like Unloc Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2802 PS1, Line 2802: ddTimer(CAT > Please add java doc for this. Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2804 PS1, Line 2804: me > nit, we can change this to a simple switch-case statement to reduce the if Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/ca
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 738 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/4 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config hms_event_incremental_refresh_transactional_table, which can switch on/off the fine-grained table refreshing. Performance Tests: A simple test was performed by running insert into one partition for partitioned ACID tables (50,000 partitions). Below are the time taken to refresh this table by the event. StorageBefore After = S3 50 secs 50 msecs local 3 secs 3 msecs Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 16 files changed, 735 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/3 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has restored this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Restored -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: restore Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has abandoned this change. ( http://gerrit.cloudera.org:8080/17858 ) Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. Abandoned Some missed modifications -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10958: Decouple getConstraintsInformation from hive.ql.metadata.Table
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17910 Change subject: IMPALA-10958: Decouple getConstraintsInformation from hive.ql.metadata.Table .. IMPALA-10958: Decouple getConstraintsInformation from hive.ql.metadata.Table After HIVE-22782, ql.metadata.Table object has no methods to set PrimaryKeyInfo and ForeignKeyInfo alone. However, we call these two functions In DescribeResultFactory to set constraints and pass the table into HiveMetadataFormatUtils. Instead of calling the methods from table, we can directly pass PrimaryKeyInfo and ForeignKeyInfo to HiveMetadataFormatUtils so that Impala won't be influenced even though the table class changes interface. Additionally, we can get rid of ql.metadata.Table for getTableInformation altogether since it just needs metastore.api.Table internally. Tests: Ran core tests. Change-Id: I2dfc54ae2f995dc4ab735d17dbbad9a48f6633da --- M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java 3 files changed, 15 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/17910/1 -- To view, visit http://gerrit.cloudera.org:8080/17910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2dfc54ae2f995dc4ab735d17dbbad9a48f6633da Gerrit-Change-Number: 17910 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10959: Reload MV as ACID tables
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17911 Change subject: IMPALA-10959: Reload MV as ACID tables .. IMPALA-10959: Reload MV as ACID tables We observed that the event processor is broken after receiving a partition event for materialized views (MV). This is because we are treating MV as view in Impala but Hive generates partition events for MV, which breaks current event processor. In this patch, we let partition events of MV follow the code path of ACID tables to reload the view. In the long term, we will need IMPALA-10723 to treat materialized view as a table. Tests: - manually testing Change-Id: Ibeab8cc53ad47d24df8baba81e1ec6ea4c80a084 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java 1 file changed, 26 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17911/1 -- To view, visit http://gerrit.cloudera.org:8080/17911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibeab8cc53ad47d24df8baba81e1ec6ea4c80a084 Gerrit-Change-Number: 17911 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] Bump up the GBN to 17296101
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17874 ) Change subject: Bump up the GBN to 17296101 .. Patch Set 2: > Patch Set 1: > > (1 comment) > > LGTM. I left a minor comment below. I can +2 this once it is addressed. Added. Could you please check again? -- To view, visit http://gerrit.cloudera.org:8080/17874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6 Gerrit-Change-Number: 17874 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 29 Sep 2021 16:21:28 + Gerrit-HasComments: No
[Impala-ASF-CR] Bump up the GBN to 17296101
Yu-Wen Lai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17874 ) Change subject: Bump up the GBN to 17296101 .. Bump up the GBN to 17296101 This patch bumps up the GBN to 17296101. This build includes HIVE-25137, which introduce a new HMS API to get acid write events of a transaction. Additionally, it excludes the ranger-plugins-audit from the dependency of ranger-plugins-common so that maven can resolve dependencies. Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6 --- M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java 3 files changed, 29 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17874/2 -- To view, visit http://gerrit.cloudera.org:8080/17874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6 Gerrit-Change-Number: 17874 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] Bump up the GBN to 17296101
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17874 Change subject: Bump up the GBN to 17296101 .. Bump up the GBN to 17296101 This patch bumps up the GBN to 17296101. This build includes HIVE-25137, which introduce a new HMS API to get acid write events of a transaction. Additionally, it excludes the ranger-plugins-audit form the dependency of ranger-plugins-common so that maven can resolve dependencies. Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6 --- M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java 3 files changed, 26 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17874/1 -- To view, visit http://gerrit.cloudera.org:8080/17874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6 Gerrit-Change-Number: 17874 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 ) Change subject: IMPALA-9857: Batching of consecutive partition events .. Patch Set 6: (2 comments) Thanks Vihang for introducing a way for batch event processing. I will rebase my patch for IMPALA-10923 on top of this patch. http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@270 PS6, Line 270: i=0, j=1 nit: spaces around assignment operator http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1803 PS6, Line 1803: for (T event : batchedEvents_) { It seems ignoredPartitions still being processed here. -- To view, visit http://gerrit.cloudera.org:8080/17848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73 Gerrit-Change-Number: 17848 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Sep 2021 03:55:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17858 Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables .. IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables To enable fine-grained table refreshing, there are three main changes in this commit. 1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep track of write id changes by AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents. 2. Trigger partition level refreshing for addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents. 3. Introduce a config incremental_refresh_acid, which can switch on/off the fine-grained table refreshing. Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 --- M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 672 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/1 -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194 PS14, Line 2194: te the table when the updatedTbl has a higher ValidWriteIdList : // if we just rely on catalog version comparison which would break the logic to : // reload on stale ValidWri > I synced up with Yu-Wen offline. Based on the discussion, I understand the Thanks Vihang for putting this together. I've added the comment. -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 16 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 10 Aug 2021 20:34:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java: http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@533 PS15, Line 533: " (c1 int) partitioned by (part int) stored as orc" + > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@547 PS15, Line 547: executeHiveSql("create table " + getTestFullAcidTblName() + > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@598 PS15, Line 598: TPartialPartitionInfo afterPartitionInfo = > nit, this comment can be removed now. Done -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 16 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 10 Aug 2021 20:32:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, we will cache the latest compaction id and refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For non-transactional tables, we still keep the original behavior. Testing: - Add several tests in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java 6 files changed, 369 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/16 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 16 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, we will cache the latest compaction id and refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For non-transactional tables, we still keep the original behavior. Testing: - Add several tests in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java 6 files changed, 362 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/15 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 15 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. Patch Set 14: (3 comments) > Patch Set 14: > > (2 comments) http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194 PS14, Line 2194: AcidUtils.compare((HdfsTable) existingTbl, :updatedTbl.getValidWriteIds(), tableId) : >= 0) > Is it guaranteed that the existingTbl will have compacted files if it has a No, it is not guaranteed. In this case, the table is coming from a full table loading and I supposed at this time the client who sent the query already acquired read/write lock on the table. Therefore, the file metadata loaded won't be cleaned for the lifetime of the query. We don't need to worry other queries because the file metadata will be updated next time when there is any compaction. If I understand the original design correctly, the conditions here make sure we only update once for many requests to a single table. After my change, refreshing file metadata also changes catalogVersion. Let's say there are one table loading and one file metadata refreshing happening together. If file metadata refreshing is finished earlier and catalogVersion is updated, we should not discard the result of full table loading here since we need to return the table with most recent validWriteIdList. http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java: http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@38 PS14, Line 38: CatalogServiceCatalog catalog, GetLatestCommittedCompactionInfoRequest request) > nit: Only pass MetastoreClient instead of whole catalog Object ? Passing catalog here so we can get a client from pool only when needed. http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java: http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@564 PS14, Line 564: Assert.assertEquals > A comment above this line saying that compacted tables/partitions should on Not sure about ORC tables. Will check. -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 14 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Thu, 05 Aug 2021 21:39:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, we will cache the latest compaction id and refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For non-transactional tables, we still keep the original behavior. Testing: - Add several tests in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java 6 files changed, 324 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/14 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 14 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, we will cache the latest compaction id and refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For non-transactional tables, we still keep the original behavior. Testing: - Add several tests in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java 6 files changed, 322 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/13 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 13 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] Bump up the GBN to 15549253
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17746 Change subject: Bump up the GBN to 15549253 .. Bump up the GBN to 15549253 This patch bumps up the GBN to 15549253. This patch includes the fix by Fang-Yu for using correct policy id to update the policy of "all - database" due to the change on the Ranger side. Testing: * ran the create-load-data.sh Change-Id: Ie7776e62dad0b9bec6c03fb9ee8f1b8728ff0e69 --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 4 files changed, 26 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/17746/1 -- To view, visit http://gerrit.cloudera.org:8080/17746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie7776e62dad0b9bec6c03fb9ee8f1b8728ff0e69 Gerrit-Change-Number: 17746 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2131 PS10, Line 2131: partsToBeRefreshed = > @Yu-Wen: If there are multiple getOrLoad requests that end up at line 2131 @Sourabh Thanks for the suggestion. I will try if I can do something like loadAsync. For refreshFileMetadata(), Vihang pointed out a potential race condition that we cannot make sure the whole table reloading was happened after a compaction or not. It is possible we end up still serve stale file metadata. To avoid more issues around race conditions, we'd need to refresh file metadata even there is a concurrent full table reloading. -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 11 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 30 Jul 2021 16:58:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, we will cache the latest compaction id and refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For non-transactional tables, we still keep the original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 12 files changed, 326 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/11 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 11 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. Patch Set 10: (6 comments) http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127 PS10, Line 2127: Get non-ACID table with writeIdList: > This text here comes as a message for the IllegalStateException thrown and Ack http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3479 PS10, Line 3479: If there is an ongoing loading task, we don't reload file metadata but wait for the :* loading task completed and return the table just loaded. > this comment is stale now that removed the loadReq logic. Ack http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@41 PS10, Line 41: Log > is this import needed? Ack http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@345 PS10, Line 345: hdfsTable_.writeLock().lock(); > why is this needed? The table locks are generally taken at CatalogOpExecuto This load fails after adding a precondition check for locking in HdfsTable.loadFileMetadataForPartitions. I suppose the lock is not taken at CatalogOpExecutor because hdfsTable_ is the internal object of icebergTable. http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@115 PS10, Line 115: able.writeLock().lock(); > Why is this needed? Same as icebergTable. After adding a precondition check for locking in HdfsTable.loadFileMetadataForPartitions, this function would fail without taking lock. http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json File testdata/cluster/ranger/setup/policy_5_revised.json: http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json@8 PS10, Line 8: 5 > how is this change related? If it is not can we remove it from this patch a Since this patch bumps up cdp version and the new version of ranger would cause failure of create-load-data.sh. If I don't put this here, I cannot get a green testing. Or should I create a seperate commit to bump up cdp version? -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 10 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 28 Jul 2021 21:40:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 12 files changed, 387 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/10 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 10 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 11 files changed, 383 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/9 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 9 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 10 files changed, 376 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/8 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 8 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 9 files changed, 370 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/7 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 7 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. Patch Set 6: (11 comments) http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG@7 PS4, Line 7: ACID ta > nit, May be change this to say "ACID table" to be more specific. Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2125 PS4, Line 2125: Preconditions.checkSta > Can you add a Preconditions check before this line to make sure that the ta Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127 PS4, Line 2127: l.readLock().lock(); > nit, can we rename this variable to something like "partsToBeRefreshed" to Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2187 PS4, Line 2187: > change to "ACID tables" since external tables are also HdfsTables Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3486 PS4, Line 3486: : if (!tryWriteLock(hdfsTable)) { : throw new CatalogException(String.format( : "Error during refreshing file metadata for table %s due to lock contention", : hdfsTable.getFullName())); : } : long newVersion = incrementAndGetCatalogVersion(); : v > This logic seems to have a race condition. How do we know that the loadReq Thanks for pointing out this. It is for optimization so I've removed it. http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@824 PS4, Line 824: if (isPartitioned()) { : for (CompactionInfoStruct ci : resp.getCompactions()) { : HdfsPartition.Builder partBuilder = nameToPartBuilder.get(ci.getPa > If you move this to line 805 you can avoid iterating the partBuilders twice Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827 PS4, Line 827: Preconditions.checkNotNull(partBuilder); : partBuilder.setLastCompactionId(ci.getId()); : } : } else { : CompactionInfoStruct ci = Iterables.getOnlyElement(resp.getCompactions()); : > I think the code readability can be improved if you handle the non-partitio Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java: http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@520 PS4, Line 520: TGetPartialCatalogObjectResponse response = > line too long (107 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@539 PS4, Line 539: response = sendRequest(request); > line too long (114 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@569 PS4, Line 569: Assert.assertTrue(prePartitionInfo.getFile_descriptors().size() > 1); > line too long (110 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@583 PS4, Line 583: .wantFiles() > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 6 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 26 Jul 2021 17:50:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table
Yu-Wen Lai has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table .. IMPALA-10801: Check the latest compaction Id before serving ACID table Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 9 files changed, 367 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/6 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 6 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. IMPALA-10801: Check the latest compaction Id before serving request Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 9 files changed, 367 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/5 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. IMPALA-10801: Check the latest compaction Id before serving request Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 10 files changed, 378 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/4 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17703 ) Change subject: [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@192 PS3, Line 192: LOG.debug("Successfully executed HMS API: " + apiName); > @Kishen: Sure, will add a UT. For now it would be a no-op since we haven't @Sourabh: I was thinking some methods might not need to wait until events are synced up but let eventProcessor to do that in the background. Given that there should be only a small number of updates, I agree this way is better and cleaner to keep each function has similar logic. -- To view, visit http://gerrit.cloudera.org:8080/17703 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I085eab20db61282daf4549ddbcc018aaf63cc361 Gerrit-Change-Number: 17703 Gerrit-PatchSet: 4 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Thu, 22 Jul 2021 20:38:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811 PS2, Line 811: Map partNameToCompactionId = new HashMap<>(); > Thanks for the suggestion. A batch size of 1K makes sense to me. I will tes In my local, the execution time of this api are ~1 ms for 1K partitions, ~10 ms for 10K paritions and ~30 ms for 50K partitions. Although it might takes a bit longer in a production env, we can expect it still falls in the range of tens of ms and I suppose it is a tolerable latency. -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 21 Jul 2021 22:18:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. IMPALA-10801: Check the latest compaction Id before serving request Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M testdata/bin/create-load-data.sh R testdata/cluster/ranger/setup/policy_5_revised.json 10 files changed, 370 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/3 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17703 ) Change subject: [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@192 PS3, Line 192: LOG.debug("Successfully executed HMS API: " + apiName); > Can you add one sample test case, where CatalogOpExecutor and MetastoreServ Do we need to sync table/database to latest event in this class? If we don't directly update cache here, is it possible to delay the sync up operation until next read? -- To view, visit http://gerrit.cloudera.org:8080/17703 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I085eab20db61282daf4549ddbcc018aaf63cc361 Gerrit-Change-Number: 17703 Gerrit-PatchSet: 4 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 21 Jul 2021 18:59:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811 PS2, Line 811: metaStoreClient.getHiveClient().getLatestCommittedCompactionInfo(request); > Should we parallelize this HMS api if table has large number of partitions Thanks for the suggestion. A batch size of 1K makes sense to me. I will test it out. http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@837 PS2, Line 837: LOG.debug("Cached compaction id for {}: {} but the latest compaction id: {}", > Log partition name as well? Will add. -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Tue, 20 Jul 2021 16:55:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2135 PS1, Line 2135: CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics() > line too long (129 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2140 PS1, Line 2140: // Update the cache miss metric, as the valid write id list did not match and we > line too long (158 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@797 PS1, Line 797: private void getAndSetLastCompactionId(IMetaStoreClient client, > line too long (113 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@828 PS1, Line 828: String partName = lci.getPartitionname() == null ? DEFAULT_PARTITION_NAME : > line too long (105 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2193 PS1, Line 2193: return client.getHiveClient() > line too long (100 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java: http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@561 PS1, Line 561: testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_MISS_METRIC); > line too long (97 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@563 PS1, Line 563: testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_HIT_METRIC); > line too long (96 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@578 PS1, Line 578: testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_MISS_METRIC); > line too long (97 > 90) Done http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@580 PS1, Line 580: testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_HIT_METRIC); > line too long (96 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Mon, 19 Jul 2021 18:15:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17697 ) Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. IMPALA-10801: Check the latest compaction Id before serving request Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java 8 files changed, 368 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/2 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request
Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17697 Change subject: IMPALA-10801: Check the latest compaction Id before serving request .. IMPALA-10801: Check the latest compaction Id before serving request Since compactions don't advance write id, we don't know if a table/partition is compacted by comparing writeIdList. A possible issue is that CatalogD provides obsolete file metadata and causes a runtime error. In order to fix this issue, we introduced a HMS API that can get the latest compaction record for a table/partition (HIVE-24828). In CatalogD, we cache compaction id while loading partitions and compare the cached id with the latest compaction id before serving. If there is a newer compaction happened, it would refresh the file metadata. Besides, this patch also change how to replace the existing table after a table full reloading. The current way is to replace the table if the catalog version is not changed. For transactional tables, things get additional complexity given that file metadata refreshing and full table reloading can happen together. We can actually use writeIdList to determine whether we should replace the table for transactional tables. As long as the updated table has more recent writeIdList than the existing one, we are safe to replace the table. For Non-transactional tables, we still keep original behavior. Testing: - Add a test in PartialCatalogInfoWriteIdTest Change-Id: I86a112a77980fef7f6238978bc9668a65262101e --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java 8 files changed, 362 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/1 -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17538 ) Change subject: IMPALA-10724: Add mutable validWriteIdList .. IMPALA-10724: Add mutable validWriteIdList In this patch, we add a new class for manually updating writeIdList. In terms of updating writeIdList, we introduce three methods including addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds. We will use this class in MetastoreEventProcessor for fine-grained table refreshing. With the control of writeIdList, we will be able to update the transactional table partially and keep it consistent. There are some restrictions for MutableValidWriteIdList. 1. We need to mark a writeId open before mark it committed/aborted. 2. We only allow two writeId state transitions, open -> committed or open -> aborted. Any other transition is NOT allowed. Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java A fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 4 files changed, 573 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/5 -- To view, visit http://gerrit.cloudera.org:8080/17538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 Gerrit-Change-Number: 17538 Gerrit-PatchSet: 5 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10740: MetastoreServiceHandler should extend DefaultThriftHiveMetastore
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17569 ) Change subject: IMPALA-10740: MetastoreServiceHandler should extend DefaultThriftHiveMetastore .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17569/2/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17569/2/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@344 PS2, Line 344: , nit: put a space after comma -- To view, visit http://gerrit.cloudera.org:8080/17569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e3f74dd96a7fec2ed13b0e5929f2b0a6b66e39f Gerrit-Change-Number: 17569 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 07 Jul 2021 21:38:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10751: new API in CatalogD to be used by Event processor for caching txn to table write id mapping
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17599 ) Change subject: IMPALA-10751: new API in CatalogD to be used by Event processor for caching txn to table write id mapping .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@776 PS1, Line 776: ( nit: add white space http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@779 PS1, Line 779: nit: wrong indentation http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@787 PS1, Line 787: nit: white space should be removed http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@789 PS1, Line 789: } nit: redundant right curly bracket? -- To view, visit http://gerrit.cloudera.org:8080/17599 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2058fdf591b2655a10a92192d5f629b72a85f08a Gerrit-Change-Number: 17599 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Wed, 16 Jun 2021 23:57:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17538 ) Change subject: IMPALA-10724: Add mutable validWriteIdList .. IMPALA-10724: Add mutable validWriteIdList In this patch, we add a new class for manually updating writeIdList. In terms of updating writeIdList, we introduce three methods including addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds. We will use this class in MetastoreEventProcessor for fine-grained table refreshing. With the control of writeIdList, we will be able to update the transactional table partially and keep it consistent. There are some restrictions for MutableValidWriteIdList. 1. We need to mark a writeId open before mark it committed/aborted. 2. We only allow two writeId state transitions, open -> committed or open -> aborted. Any other transition is NOT allowed. Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java A fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 4 files changed, 576 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/4 -- To view, visit http://gerrit.cloudera.org:8080/17538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 Gerrit-Change-Number: 17538 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17538 ) Change subject: IMPALA-10724: Add mutable validWriteIdList .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17538/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17538/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2701 PS1, Line 2701: public MutableValidWriteIdList getMutableValidWriteIds() { > Why do you have to change the method signature ? Oh I see. We can explicitly cast it to MutableValidWriteIdList when we need to update it. Will remove this. -- To view, visit http://gerrit.cloudera.org:8080/17538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 Gerrit-Change-Number: 17538 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Fri, 11 Jun 2021 21:53:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17538 ) Change subject: IMPALA-10724: Add mutable validWriteIdList .. IMPALA-10724: Add mutable validWriteIdList In this patch, we add a new class for manually updating writeIdList. In terms of updating writeIdList, we introduce three methods including addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds. We will use this class in MetastoreEventProcessor for fine-grained table refreshing. With the control of writeIdList, we will be able to update the transactional table partially and keep it consistent. There are some restrictions for MutableValidWriteIdList. 1. We need to mark a writeId open before mark it committed/aborted. 2. We only allow two writeId state transitions, open -> committed or open -> aborted. Any other transition is NOT allowed. Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java A fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 4 files changed, 580 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/3 -- To view, visit http://gerrit.cloudera.org:8080/17538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 Gerrit-Change-Number: 17538 Gerrit-PatchSet: 3 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai
[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList
Yu-Wen Lai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17538 ) Change subject: IMPALA-10724: Add mutable validWriteIdList .. IMPALA-10724: Add mutable validWriteIdList In this patch, we add a new class for manually updating writeIdList. In terms of updating writeIdList, we introduce three methods including addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds. We will use this class in MetastoreEventProcessor for fine-grained table refreshing. With the control of writeIdList, we will be able to update the transactional table partially and keep it consistent. There are some restrictions for MutableValidWriteIdList. 1. We need to mark a writeId open before mark it committed/aborted. 2. We only allow two writeId state transitions, open -> committed or open -> aborted. Any other transition is NOT allowed. Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java A fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java 4 files changed, 577 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/2 -- To view, visit http://gerrit.cloudera.org:8080/17538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6 Gerrit-Change-Number: 17538 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yu-Wen Lai