[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to d...@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. A new test verifies the behavior, set to 'xfail' when running on the existing catalog implementation. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Reviewed-on: http://gerrit.cloudera.org:8080/10970 Tested-by: Impala Public Jenkins Reviewed-by: Vuk Ercegovac --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java A testdata/workloads/functional-query/queries/QueryTest/incompatible_avro_partition.test M tests/common/custom_cluster_test_suite.py M tests/conftest.py M tests/metadata/test_partition_metadata.py M tests/query_test/test_avro_schema_resolution.py 12 files changed, 270 insertions(+), 35 deletions(-) Approvals: Impala Public Jenkins: Verified Vuk Ercegovac: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 9 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java: http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@329 PS7, Line 329: private static boolean isAvroFormat(Table msTbl) { > right. This code is just used to determine whether to bother sending any av makes sense, thanks. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 07 Aug 2018 16:48:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 07 Aug 2018 11:33:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/217/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 07 Aug 2018 08:43:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java: http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@329 PS7, Line 329: for (FeFsPartition p : partitions) { > clarifying question: since the list of partitions that are loaded is partia right. This code is just used to determine whether to bother sending any avro schema in the descriptor. (If there is no Avro partition included in the descriptor, there is no reason to bother sending it). So, in that case, even if it's an incompatible schema, if you aren't querying any avro partitions, it won't matter. It's only if it's incompatible _and_ you query such a partition that you'd get an error. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 07 Aug 2018 07:07:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java: http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@329 PS7, Line 329: for (FeFsPartition p : partitions) { clarifying question: since the list of partitions that are loaded is partial, the result from this method will depend on the subset that's loaded, correct? so if the user is in the case where the table is not an avro schema, but some partition has an avro schema, they may or may not see an error due to incompatible schemas (on read). If that's the case, the impact of the difference in behavior may even be less. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 06 Aug 2018 19:00:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 03 Aug 2018 00:10:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@143 PS6, Line 143: s > flake8: E125 continuation line with same indent as next logical line This check should be disabled from now onwards. http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@145 PS6, Line 145: > flake8: E125 continuation line with same indent as next logical line This check should be disabled from now onwards. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 01 Aug 2018 23:46:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/140/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 01 Aug 2018 23:36:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@143 PS6, Line 143: s flake8: E125 continuation line with same indent as next logical line http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@145 PS6, Line 145: flake8: E125 continuation line with same indent as next logical line -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 01 Aug 2018 22:33:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10970 to look at the new patch set (#6). Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to d...@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. A new test verifies the behavior, set to 'xfail' when running on the existing catalog implementation. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java A testdata/workloads/functional-query/queries/QueryTest/incompatible_avro_partition.test M tests/common/custom_cluster_test_suite.py M tests/conftest.py M tests/metadata/test_partition_metadata.py M tests/query_test/test_avro_schema_resolution.py 12 files changed, 270 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/6 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/10970/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10970/5//COMMIT_MSG@39 PS5, Line 39: if an Avro partition is added to a non-Avro table, and that partition : has a schema that isn't compatible with the table's schema, an error : will occur on read. Can we test this yet? Or is the plan to test existing and local catalog together after IMPALA-7309? http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1558 PS5, Line 1558: ; Why don't we return here? Is there any need to reconcile a inferred schema? http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java File fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java: http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java@47 PS5, Line 47: HdfsTable AvroSchemaUtils http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java File fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java: http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java@239 PS5, Line 239: // TODO(todd): do we have any tables which are mixed format? alltypesmixedformat: https://github.com/apache/impala/blob/b5608264b4552e44eb73ded1e232a8775c3dba6b/testdata/bin/load-dependent-tables.sql#L62 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 27 Jul 2018 23:30:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/62/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Jul 2018 21:07:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 5: Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/62/ Running initial code review checks. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Jul 2018 20:07:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10970 to look at the new patch set (#5). Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to d...@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M tests/query_test/test_avro_schema_resolution.py 8 files changed, 173 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/5 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/32/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 24 Jul 2018 19:54:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 4: Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/32/ Running initial code review checks. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 24 Jul 2018 18:46:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10970 to look at the new patch set (#4). Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to d...@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M tests/query_test/test_avro_schema_resolution.py 7 files changed, 167 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/4 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/19/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 24 Jul 2018 00:52:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10970 ) Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. Patch Set 2: Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/19/ Running initial code review checks. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 23 Jul 2018 23:48:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog
Hello Tianyi Wang, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10970 to look at the new patch set (#2). Change subject: IMPALA-7308. Support Avro tables in LocalCatalog .. IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to d...@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M tests/query_test/test_avro_schema_resolution.py 7 files changed, 166 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/2 -- To view, visit http://gerrit.cloudera.org:8080/10970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Gerrit-Change-Number: 10970 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac