Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21019 )

Change subject: IMPALA-12487: Skip reloading file metadata for ALTER_TABLE 
events with trivial changes in StorageDescriptor
......................................................................


Patch Set 3:

(1 comment)

> Patch Set 2:
>
> > Patch Set 2:
> >
> > (2 comments)
> >
> > Skip reloading file metadata looks good to me. But I'm uncertain whether 
> > skip reloading table schema is safe. Maybe we should compare the whole 
> > msTable object instead of just checking several fields.
>
> Comparing the whole msTable object is not ideal because certain fields like 
> change columns, change owner, may require required table metadata reload but 
> certain other properties in serde like change location, and change row/file 
> format don't need table metadata reload. This is much more evident when we 
> use this patch in conjunction with IMPALA-10976.

I found a bug that changing the fileformat to AVRO does require table schema 
reload. Details in the comments.
I think the main purpose for this JIRA is to skip reloading file metadata which 
is expensive. Skip reloading the table schema is a minor optimization and 
brings risks. I'm not sure if there are other bugs like IMPALA-12889. So keeps 
reloading the table schema seems safer to me.
When testing IMPALA-10976, did you find any test failure if we don't skip 
reloading table schema?

http://gerrit.cloudera.org:8080/#/c/21019/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/21019/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1788
PS1, Line 1788:             !Objects.equals(beforeSd.getSerdeInfo(), 
afterSd.getSerdeInfo())) {
> This behavior is consistent with the CatalogOpExecutor (i.e., command execu
Thanks for pointing to the existing codes. I found it's an existing issue that 
avro schema is not updated correctly when changing file format to AVRO. Filed 
IMPALA-12889.

The following scenario works before this patch:
1. Create a non-avro table with 'avro.schema.url' set to a valid schema file.
impala> create table my_part_tbl(i int) partitioned by (p int) stored as 
parquet;
impala> alter table my_part_tbl set 
tblproperties('avro.schema.url'='hdfs:////test-warehouse/avro_schemas/functional/alltypes.json');

2. In HIVE, change the file format to AVRO
hive> alter table my_part_tbl set fileformat avro;

3. After Impala applies the ALTER_TABLE event, the schema should be the same:
hive> describe my_part_tbl;

 +--------------------------+------------+----------+
 |         col_name         | data_type  | comment  |
 +--------------------------+------------+----------+
 | id                       | int        |          |
 | bool_col                 | boolean    |          |
 | tinyint_col              | int        |          |
 | smallint_col             | int        |          |
 | int_col                  | int        |          |   
 | bigint_col               | bigint     |          |
 | float_col                | float      |          |
 | double_col               | double     |          |
 | date_string_col          | string     |          |
 | string_col               | string     |          |
 | timestamp_col            | string     |          |
 | p                        | int        |          |
 |                          | NULL       | NULL     |
 | # Partition Information  | NULL       | NULL     |
 | # col_name               | data_type  | comment  |
 | p                        | int        |          |
 +--------------------------+------------+----------+

If I apply the current patch, Impala will keep using the old schema even after 
applying the ALTER_TABLE event.
impala> describe my_part_tbl

 +------+------+---------+
 | name | type | comment |
 +------+------+---------+
 | i    | int  |         |
 | p    | int  |         |
 +------+------+---------+



--
To view, visit http://gerrit.cloudera.org:8080/21019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6fd9a9504bf93d2529dc7accbf436ad83e51d8ac
Gerrit-Change-Number: 21019
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Mar 2024 12:27:28 +0000
Gerrit-HasComments: Yes

Reply via email to