[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..

IMPALA-7308. Support Avro tables in LocalCatalog

This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
d...@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.

[1] 
https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Reviewed-on: http://gerrit.cloudera.org:8080/10970
Tested-by: Impala Public Jenkins 
Reviewed-by: Vuk Ercegovac 
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/incompatible_avro_partition.test
M tests/common/custom_cluster_test_suite.py
M tests/conftest.py
M tests/metadata/test_partition_metadata.py
M tests/query_test/test_avro_schema_resolution.py
12 files changed, 270 insertions(+), 35 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vuk Ercegovac: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-07 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 8: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@329
PS7, Line 329:   private static boolean isAvroFormat(Table msTbl) {
> right. This code is just used to determine whether to bother sending any av
makes sense, thanks.



--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 07 Aug 2018 16:48:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 07 Aug 2018 11:33:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/217/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 07 Aug 2018 08:43:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@329
PS7, Line 329: for (FeFsPartition p : partitions) {
> clarifying question: since the list of partitions that are loaded is partia
right. This code is just used to determine whether to bother sending any avro 
schema in the descriptor. (If there is no Avro partition included in the 
descriptor, there is no reason to bother sending it). So, in that case, even if 
it's an incompatible schema, if you aren't querying any avro partitions, it 
won't matter. It's only if it's incompatible _and_ you query such a partition 
that you'd get an error.



--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 07 Aug 2018 07:07:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-06 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/10970/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@329
PS7, Line 329: for (FeFsPartition p : partitions) {
clarifying question: since the list of partitions that are loaded is partial, 
the result from this method will depend on the subset that's loaded, correct? 
so if the user is in the case where the table is not an avro schema, but some 
partition has an avro schema, they may or may not see an error due to 
incompatible schemas (on read). If that's the case, the impact of the 
difference in behavior may even be less.



--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 06 Aug 2018 19:00:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-02 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 03 Aug 2018 00:10:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@143
PS6, Line 143: s
> flake8: E125 continuation line with same indent as next logical line
This check should be disabled from now onwards.


http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@145
PS6, Line 145:  
> flake8: E125 continuation line with same indent as next logical line
This check should be disabled from now onwards.



--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:46:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/140/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:36:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@143
PS6, Line 143: s
flake8: E125 continuation line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/10970/6/tests/metadata/test_partition_metadata.py@145
PS6, Line 145:
flake8: E125 continuation line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 01 Aug 2018 22:33:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-08-01 Thread Todd Lipcon (Code Review)
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10970

to look at the new patch set (#6).

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..

IMPALA-7308. Support Avro tables in LocalCatalog

This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
d...@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.

[1] 
https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/incompatible_avro_partition.test
M tests/common/custom_cluster_test_suite.py
M tests/conftest.py
M tests/metadata/test_partition_metadata.py
M tests/query_test/test_avro_schema_resolution.py
12 files changed, 270 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/6
--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-27 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10970/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10970/5//COMMIT_MSG@39
PS5, Line 39: if an Avro partition is added to a non-Avro table, and that 
partition
:   has a schema that isn't compatible with the table's schema, an 
error
:   will occur on read.
Can we test this yet? Or is the plan to test existing and local catalog 
together after IMPALA-7309?


http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1558
PS5, Line 1558: ;
Why don't we return here? Is there any need to reconcile a inferred schema?


http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java
File fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java:

http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java@47
PS5, Line 47: HdfsTable
AvroSchemaUtils


http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/10970/5/fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java@239
PS5, Line 239: // TODO(todd): do we have any tables which are mixed format?
alltypesmixedformat: 
https://github.com/apache/impala/blob/b5608264b4552e44eb73ded1e232a8775c3dba6b/testdata/bin/load-dependent-tables.sql#L62



--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 27 Jul 2018 23:30:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/62/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Jul 2018 21:07:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 5:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/62/

Running initial code review checks. This is experimental - please report any 
issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Jul 2018 20:07:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-25 Thread Todd Lipcon (Code Review)
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10970

to look at the new patch set (#5).

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..

IMPALA-7308. Support Avro tables in LocalCatalog

This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
d...@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

[1] 
https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M tests/query_test/test_avro_schema_resolution.py
8 files changed, 173 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/5
--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/32/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 19:54:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 4:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/32/

Running initial code review checks. This is experimental - please report any 
issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 18:46:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-24 Thread Todd Lipcon (Code Review)
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10970

to look at the new patch set (#4).

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..

IMPALA-7308. Support Avro tables in LocalCatalog

This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
d...@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

[1] 
https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M tests/query_test/test_avro_schema_resolution.py
7 files changed, 167 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/4
--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/19/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 00:52:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10970 )

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..


Patch Set 2:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/19/

Running initial code review checks. This is experimental - please report any 
issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 23 Jul 2018 23:48:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7308. Support Avro tables in LocalCatalog

2018-07-23 Thread Todd Lipcon (Code Review)
Hello Tianyi Wang, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10970

to look at the new patch set (#2).

Change subject: IMPALA-7308. Support Avro tables in LocalCatalog
..

IMPALA-7308. Support Avro tables in LocalCatalog

This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:

- if an explicit avro schema is specified, it overrides the schema
  provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
  inferred schema takes the place of the one provided by the HMS (thus
  promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
  schema and the inferred schema, an error is emitted.

The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:

The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
d...@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.

Thus, the LocalCatalog implementation differs as follows:

- the "schema override" behavior ONLY occurs if the Avro file format has
  been selected at a table level.

- if an Avro partition is added to a non-Avro table, and that partition
  has a schema that isn't compatible with the table's schema, an error
  will occur on read.

The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.

[1] 
https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E

Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M tests/query_test/test_avro_schema_resolution.py
7 files changed, 166 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10970/2
--
To view, visit http://gerrit.cloudera.org:8080/10970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Gerrit-Change-Number: 10970
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac