[ https://issues.apache.org/jira/browse/IMPALA-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825470#comment-17825470 ]
Quanlong Huang commented on IMPALA-12889: ----------------------------------------- Here is where catalogd processes the request of changing fileformat: https://github.com/apache/impala/blob/085b1806da6a1941200288a2f9a243e389e10820/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1193-L1204 reloadTableSchema is unchanged so it's false, which leads to not reloading the avro schema. > Changing file format to AVRO doesn't update schema using 'avro.schema.url' > -------------------------------------------------------------------------- > > Key: IMPALA-12889 > URL: https://issues.apache.org/jira/browse/IMPALA-12889 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Quanlong Huang > Priority: Major > Labels: ramp-up > Attachments: alltypes.json > > > When changing the file format of a table to AVRO, the schema is not updated > if there is a tblproperty of 'avro.schema.url'. However, after a REFRESH, the > schema is updated: > {code:sql} > create table my_part_tbl(i int) partitioned by (p int) stored as parquet; > alter table my_part_tbl set tblproperties( > > 'avro.schema.url'='hdfs:////test-warehouse/avro_schemas/functional/alltypes.json'); > alter table my_part_tbl set fileformat avro; > describe my_part_tbl > +------+------+---------+ > | name | type | comment | > +------+------+---------+ > | i | int | | > | p | int | | > +------+------+---------+ > refresh my_part_tbl; > describe my_part_tbl > +-----------------+---------+-------------------+ > | name | type | comment | > +-----------------+---------+-------------------+ > | id | int | from deserializer | > | bool_col | boolean | from deserializer | > | tinyint_col | int | from deserializer | > | smallint_col | int | from deserializer | > | int_col | int | from deserializer | > | bigint_col | bigint | from deserializer | > | float_col | float | from deserializer | > | double_col | double | from deserializer | > | date_string_col | string | from deserializer | > | string_col | string | from deserializer | > | timestamp_col | string | from deserializer | > | p | int | | > +-----------------+---------+-------------------+ > {code} > Note that explicitly setting the tblproperty after changing the file format > to AVRO does refresh the schema. I.e. changing fileformat before setting > 'avro.schema.url' works, but setting 'avro.schema.url' before changing > fileformat doesn't work. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org