Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21019 )
Change subject: IMPALA-12487: Skip reloading file metadata for ALTER_TABLE events with trivial changes in StorageDescriptor ...................................................................... Patch Set 3: (1 comment) > Patch Set 2: > > > Patch Set 2: > > > > (2 comments) > > > > Skip reloading file metadata looks good to me. But I'm uncertain whether > > skip reloading table schema is safe. Maybe we should compare the whole > > msTable object instead of just checking several fields. > > Comparing the whole msTable object is not ideal because certain fields like > change columns, change owner, may require required table metadata reload but > certain other properties in serde like change location, and change row/file > format don't need table metadata reload. This is much more evident when we > use this patch in conjunction with IMPALA-10976. I found a bug that changing the fileformat to AVRO does require table schema reload. Details in the comments. I think the main purpose for this JIRA is to skip reloading file metadata which is expensive. Skip reloading the table schema is a minor optimization and brings risks. I'm not sure if there are other bugs like IMPALA-12889. So keeps reloading the table schema seems safer to me. When testing IMPALA-10976, did you find any test failure if we don't skip reloading table schema? http://gerrit.cloudera.org:8080/#/c/21019/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/21019/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1788 PS1, Line 1788: !Objects.equals(beforeSd.getSerdeInfo(), afterSd.getSerdeInfo())) { > This behavior is consistent with the CatalogOpExecutor (i.e., command execu Thanks for pointing to the existing codes. I found it's an existing issue that avro schema is not updated correctly when changing file format to AVRO. Filed IMPALA-12889. The following scenario works before this patch: 1. Create a non-avro table with 'avro.schema.url' set to a valid schema file. impala> create table my_part_tbl(i int) partitioned by (p int) stored as parquet; impala> alter table my_part_tbl set tblproperties('avro.schema.url'='hdfs:////test-warehouse/avro_schemas/functional/alltypes.json'); 2. In HIVE, change the file format to AVRO hive> alter table my_part_tbl set fileformat avro; 3. After Impala applies the ALTER_TABLE event, the schema should be the same: hive> describe my_part_tbl; +--------------------------+------------+----------+ | col_name | data_type | comment | +--------------------------+------------+----------+ | id | int | | | bool_col | boolean | | | tinyint_col | int | | | smallint_col | int | | | int_col | int | | | bigint_col | bigint | | | float_col | float | | | double_col | double | | | date_string_col | string | | | string_col | string | | | timestamp_col | string | | | p | int | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | p | int | | +--------------------------+------------+----------+ If I apply the current patch, Impala will keep using the old schema even after applying the ALTER_TABLE event. impala> describe my_part_tbl +------+------+---------+ | name | type | comment | +------+------+---------+ | i | int | | | p | int | | +------+------+---------+ -- To view, visit http://gerrit.cloudera.org:8080/21019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6fd9a9504bf93d2529dc7accbf436ad83e51d8ac Gerrit-Change-Number: 21019 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Mon, 11 Mar 2024 12:27:28 +0000 Gerrit-HasComments: Yes