[ https://issues.apache.org/jira/browse/IMPALA-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565820#comment-16565820 ]
Todd Lipcon commented on IMPALA-7309: ------------------------------------- Another weird thing to note: the current behavior seems to be different depending whether the main table fileformat is text or parquet. This seems to be because of the following code: {code} String serdeLib = msTbl.getSd().getSerdeInfo().getSerializationLib(); if (serdeLib == null || serdeLib.equals("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) { // If the SerDe library is null or set to LazySimpleSerDe or is null, it // indicates there is an issue with the table metadata since Avro table need a // non-native serde. Instead of failing to load the table, fall back to // using the fields from the storage descriptor (same as Hive). return; {code} In the case of text, we hit this code path and ignore the avro schema. In the case of Parquet, the serde is set to some Parquet-related SerDe and thus we fall through to the "reconcile avro schema" code path. > Prevent the addition of Avro schemas to non-Avro tables with incompatible > schema > -------------------------------------------------------------------------------- > > Key: IMPALA-7309 > URL: https://issues.apache.org/jira/browse/IMPALA-7309 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend > Reporter: Todd Lipcon > Priority: Major > > Per a recent [mailing list > thread|https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@<dev.impala.apache.org>] > the behavior of Avro partitions within non-Avro tables is inconsistent with > Hive, and somewhat suprising. For example, the addition of a partition can > cause the results of "describe" on the table to change, but only after a > refresh or invalidate. In the mailing list thread, we decided to change the > behavior to: > 1. Schema handling: > - if a table's properties indicate it's an avro table, parse and adopt the > external avro schema as the table schema, or infer an avro-compatible schema > from the existing columns > - if a table's properties indicate it's _not_ an avro table, but there is > an external avro schema defined in the table properties, then parse the > avro schema and include it in the TableDescriptor (for use by avro > partitions) but *do not* adopt it as the table schema. > 2. Handling incompatible schemas: > - If the table-level format is non-Avro, > - AND the table contains column types incompatible with Avro (eg tinyint), > - AND the table has an existing avro partition, > - THEN the query will yield an error about incompatible types > 3. Try to prevent shooting in the foot > - If the table-level format is non-Avro, > - AND the table contains column types incompatible with Avro (eg tinyint), > - THEN disallow changing the file format of an existing partition to Avro -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org