Hello Tianyi Wang, Vuk Ercegovac, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/10970 to look at the new patch set (#2). Change subject: IMPALA-7308. Support Avro tables in LocalCatalog ...................................................................... IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to d...@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M tests/query_test/test_avro_schema_resolution.py 7 files changed, 166 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/2 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>