[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#3).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912, "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

TODO:
 * Currently ALTER TABLE is enabled => assess consequences
 * make all tests green in exhaustive

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test
M tests/authorization/test_ranger.py
M tests/metadata/test_show_create_table.py
M tests/query_test/test_acid.py
M tests/query_test/test_mt_dop.py
M tests/query_test/test_scanners.py
M tests/query_test/test_scanners_fuzz.py
43 files changed, 1,184 insertions(+), 517 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/15395/3
--
To view, vis

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/bin/generate-schema-statements.py@319
PS3, Line 319: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py@197
PS3, Line 197: .
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py@282
PS3, Line 282:
flake8: E261 at least two spaces before inline comment


http://gerrit.cloudera.org:8080/#/c/15395/3/tests/query_test/test_scanners_fuzz.py@300
PS3, Line 300: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 17 Mar 2020 17:15:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 3:

(7 comments)

Thanks for reviewing it. Most of the changes are related to tests.

For new reviewers: I think it's worth to start with PS2 because it contains the 
essence of this CR.

http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG@63
PS2, Line 63: TODO:
> Also need tests on column masking since we also have some hacks in path res
Updated test_ranger.py


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc@185
PS2, Line 185:   if (scan_node_->hdfs_table()->IsFullAcid() && 
!schema_resolver_->IsFullAcid()) {
> I think this is too strict. The test on file schema can be false positive.
Done


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@101
PS2, Line 101: DCHECK(table_idx >= num_part_cols);
> I think this can be a DCHECK since partition columns should be skipped by h
Done


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@106
PS2, Line 106:
> Could you add a DCHECK that the resulted index won't overflow?
Done


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@234
PS2, Line 234: root_->getSubtype(0)->getKind() != orc::TypeKind::INT ||
 :   root_->getSubtype(1)->getKind() !
> I think we should also check other fields and their types.
Done


http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@847
PS2, Line 847: HIVEFULLACIDWRITE
> We don't have write support yet. Is this required somewhere?
We need it to create tables.


http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py@1297
PS2, Line 1297: self.client.execute("create table %s.%s like tpch.lineitem 
stored as orc" % (db, tbl))
> Do we need this? Doesn't the table being translated to EXTERNAL table?
Removed it.



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 17 Mar 2020 17:18:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5510/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 17 Mar 2020 18:01:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 3:

(6 comments)

Thanks for working on this. Left some comments, mostly questions to get a 
better understanding of how ACID works in Impala.
Later I will look at the tests as well.

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110
PS3, Line 110: file_idx -= num_part_cols;
nit: This and the same @115 could be moved outside the if.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118
PS3, Line 118:   table_idx += 1 + num_part_cols;
Could you explain the values here?
In this case the file_idx should be the original table_idx?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297
PS3, Line 297:
Removing this mean that we support write operations on full ACID tables? Do we 
really need to remove this one?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224
PS3, Line 224: ensureTableNotFullAcid
There are now two ensureTableNotFullAcid methods, that only differ in the 
parameters and the error msg thrown.
Is this method and the INSERT_ONLY_ACID_TABLE_SUPPORTED_ERROR_MSG message used 
anymore? Can they be removed?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@237
PS3, Line 237: String
nit: maybe change this string to enum?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@238
PS3, Line 238:   throws AnalysisException {
nit: missing space



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 18 Mar 2020 17:34:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 3:

(13 comments)

I think the approach makes sense. Thanks for adding so many tests.

http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG@58
PS3, Line 58:  * SHOW CREATE TABLE (show-create-table-full-acid.test)
Can you also add describe and describe formatted tests? I assume these show 
hide row__id.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@181
PS3, Line 181:   VLOG_QUERY << filename();
I assume you're going to remove the log lines?


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@185
PS3, Line 185:   if (scan_node_->hdfs_table()->IsFullAcid() && 
!schema_resolver_->IsFullAcid()) {
This is going to be addressed at some point, right? So we can scan original 
files too.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h@46
PS3, Line 46:   bool IsFullAcid() const;
Maybe IsFileFullAcid()? Since it's going to get confusing in future about 
whether this is referring to the file or the table.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@94
PS3, Line 94: if (table_idx == num_part_cols + 5) {
I'd find it clearer if it was a named constant instead of the magic number


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@101
PS3, Line 101: DCHECK
DCHECK_GE


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231
PS3, Line 231: bool OrcSchemaResolver::IsFullAcid() const {
Can you add a brief comment on the approach? It seems to be basically "if it 
walks like a duck and quacks like a duck, it's a duck".


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h@333
PS3, Line 333: IsFullAcid
IsFullAcidTable?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@94
PS3, Line 94: // analyzer.ensureTableNotTransactional(table_, "ALTER 
TABLE");
?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Path.java@473
PS3, Line 473: convertToFullAcid
convertToFullAcidFilePath?


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
File testdata/workloads/functional-query/queries/QueryTest/acid-negative.test:

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test@1
PS3, Line 1: #
Why commented out?


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
File testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test:

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test@16
PS3, Line 16:  QUERY
Can you also add a LABELS section to make sure the labels for the ROW__ID 
components are sane.

Any maybe also a test that selects individual elements of row__id.


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test@17
PS3, Line 17: select row__id.*, * from functional_orc_def.alltypestiny;
Do we expect the values to be deterministic across data loads?



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 18 Mar 2020 18:08:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#4).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

TODO:
 * make all tests green in exhaustive

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/function

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15395/4/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/4/testdata/bin/generate-schema-statements.py@319
PS4, Line 319: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py@197
PS4, Line 197: .
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py@283
PS4, Line 283:
flake8: E261 at least two spaces before inline comment


http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_scanners_fuzz.py@301
PS4, Line 301: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Mar 2020 16:32:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 4:

(21 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG@58
PS3, Line 58:  * specific queries about hidden columns (full-acid-rowid.test)
> Can you also add describe and describe formatted tests? I assume these show
Added tests to 'describe-path.test'.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@181
PS3, Line 181:   bool is_file_full_acid = 
reader_->hasMetadataValue("hive.acid.version") &&
> I assume you're going to remove the log lines?
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@185
PS3, Line 185: return Status(Substitute("Error: Table is in full ACID 
format, but file "
> This is going to be addressed at some point, right? So we can scan original
Yes, "original files" are tracked with IMPALA-9515. Added TODO comment here.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h@46
PS3, Line 46:  private:
> Maybe IsFileFullAcid()? Since it's going to get confusing in future about w
Removed this function.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@94
PS3, Line 94: if (table_idx == num_part_cols + FILE_INDEX_OF_FIELD_ROW) 
{
> I'd find it clearer if it was a named constant instead of the magic number
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@101
PS3, Line 101: DCHECK
> DCHECK_GE
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110
PS3, Line 110: else {
> nit: This and the same @115 could be moved outside the if.
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118
PS3, Line 118:   // Hence, in the table metadata this is a top-level 
column, i.e. it is offsetted
> Could you explain the values here?
Added some comments about it.

Yes, file_idx should remain the same because only the first number is offsetted 
with 'num_part_cols'.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231
PS3, Line 231:   type.DebugString(), orc_type.toString(), filename_));
> Can you add a brief comment on the approach? It seems to be basically "if i
Looking at the ORC file metadata it turned out that the ACID version is also 
written there, so using that from now on instead of this duck approach.

Do you think it's worth to keep this for verifying the ACID schema?


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h@333
PS3, Line 333: IsTableFul
> IsFullAcidTable?
Renamed it to IsTableFullAcid() to make it similar to IsFileFullAcid() (though 
the latter was entirely removed since then)


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@94
PS3, Line 94: // TODO: IMPALA-8831 will enable all ALTER TABLE statements 
on transactional tables.
> ?
Added a comment about it. We need ALTER TABLE ADD PARTITION and ALTER TABLE 
SORT BY during data loading so in the new PS I only enable those.

ALTER TABLE stmts are disabled because they fail when the table has column 
statistics (IMPALA-8831). The enabled ALTER TABLE statements should be safe in 
those regard. Hopefully there'll be a separate patch for IMPALA-8831 that does 
this for us so we can remove this from the final version of this change request.


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297
PS3, Line 297:
> Removing this mean that we support write operations on full ACID tables? Do
ALTER TABLE and DROP TABLE calls this, but they are supported operations.


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224
PS3, Line 224: r, table.getFullName()
> There are now two ensureTableNotFullAcid methods, that only differ in the p

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5541/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Mar 2020 17:16:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#5).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/bin/generate-schema-statements.py@319
PS5, Line 319: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@197
PS5, Line 197: .
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@283
PS5, Line 283:
flake8: E261 at least two spaces before inline comment


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@301
PS5, Line 301: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 23 Mar 2020 14:58:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5566/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 23 Mar 2020 15:43:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h@51
PS6, Line 51:   bool is_table_full_acid_;
These could be const I think, since they're only set in the constructor


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231
PS3, Line 231:   "Type mismatch: table column $0 is map to column $1 in ORC 
file '$2'",
> Looking at the ORC file metadata it turned out that the ACID version is als
I think it would be good to validate that the schema is as expected, just so 
the rest of the code can reliably assume that the schema is sane.


http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@323
PS6, Line 323:   table.getMetaStoreTable().getParameters());
nit: indentation is a bit off


http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@329
PS6, Line 329: if (isFullAcid && i == table.getNumClusteringCols() &&
nit: multi-line if should use parentheses. Maybe this would be better to 
restructure as an else if anyway...


http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test
File testdata/workloads/functional-query/queries/QueryTest/describe-path.test:

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143
PS5, Line 143: 'row__id','struct<\n  operation:int comment ,\n  
originaltransaction:bigint comment ,\n  bucket:int comment ,\n  
rowid:bigint comment ,\n  currenttransaction:bigint comment \n>',''
I'm not sure if making this visible is a good idea. What does hive do?



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 26 Mar 2020 23:49:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#7).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test
M test

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py@321
PS7, Line 321: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py@306
PS7, Line 306: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:15:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py@321
PS8, Line 321: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@306
PS8, Line 306: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:27:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#8).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test
M test

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

(7 comments)

PS8 is a rebase.

http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h@51
PS6, Line 51:  private:
> These could be const I think, since they're only set in the constructor
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231
PS3, Line 231:   return Status(Substitute(
> I think it would be good to validate that the schema is as expected, just s
Turned it into a validator function that returns a Status, calling it from 
HdfsOrcScanner::Open().


http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@323
PS6, Line 323: table.getMetaStoreTable().getParameters());
> nit: indentation is a bit off
Done


http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@329
PS6, Line 329: 
Preconditions.checkState(col.getName().equals("row__id"));
> nit: multi-line if should use parentheses. Maybe this would be better to re
Done, also converted the 'equals("row__id")' check to a Precondition.


http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test
File testdata/workloads/functional-query/queries/QueryTest/describe-path.test:

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143
PS5, Line 143: describe functional_orc_def.alltypes
> I'm not sure if making this visible is a good idea. What does hive do?
Yeah, I wasn't sure about this, but it was handy during development.

Hive doesn't reveal the hidden fields in DESCRIBE, so I switched to hiding them 
too.

However, when explicitly asked for it, currently I still allow describe on the 
row__id field, i.e.:

  DESCRIBE .row__id;
  +-++-+
  | name| type   | comment |
  +-++-+
  | operation   | int| |
  | originaltransaction | bigint | |
  | bucket  | int| |
  | rowid   | bigint | |
  | currenttransaction  | bigint | |
  +-++-+


I think it's not wrong since we also allow the user to explicitly query the 
fields of the row__id column.

Hive only allows DESCRIBE on tables.


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@197
PS5, Line 197:
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@283
PS5, Line 283:
> flake8: E261 at least two spaces before inline comment
Done



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:29:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5620/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:57:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87
PS8, Line 87:   DCHECK(ValidateFullAcidFileSchema().ok()); // Should have 
already been validated.
It would be nice if this printed the validation error. Maybe we need a 
DCHECK_OK macro, like EXPECT_OK?


http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test
File testdata/workloads/functional-query/queries/QueryTest/describe-path.test:

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143
PS5, Line 143: describe functional_orc_def.alltypes
> Yeah, I wasn't sure about this, but it was handy during development.
That's fine, I think there's some similar behaviour with nested types if you 
describe nested fields that are not explicitly listed, e.g. you can refer to a 
nested item with the .item member or implicitly by omitting it.



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5621/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:08:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-30 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87
PS8, Line 87:   DCHECK(ValidateFullAcidFileSchema().ok()); // Should have 
already been validated.
> It would be nice if this printed the validation error. Maybe we need a DCHE
I like the idea, but I run into name clashes because we define 
KUDU_HEADERS_USE_SHORT_STATUS_MACROS in be/CMakeLists.txt. Therefore we already 
have a DCHECK_OK but it expects a kudu::Status. (plus it always evaluates the 
expression passed to the macro).

To resolve this I opened: https://gerrit.cloudera.org/#/c/15596/



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 30 Mar 2020 16:00:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

(6 comments)

Thanks for adding so many tests! Left some comments for the tests. The changes 
for path resolution looks good to me. Haven't gone through the changes related 
to data load. I'll look into them tomorrow if still needs.

http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG@7
PS8, Line 7: IMPALA-9042
Change this to IMPALA-9484?


http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@194
PS8, Line 194: "hive.acid.version"
nit: make this a constant?


http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@197
PS8, Line 197: but file "
 : "is not
nit: explain that it doesn't have metadata "hive.acid.version" = "2"?


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py@213
PS8, Line 213:   base_table = 
"functional_orc_def.complextypestbl_non_transactional"
Do we have test coverage on reading nested types from full-ACID partitioned 
table?


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py@207
PS8, Line 207: functional_orc_def
I think we should get the db name by

 "functional" + vector.get_value('table_format').db_suffix()

in case that the compression is not deflate, but snappy or others in exhaustive 
test.


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181
PS8, Line 181:   self.run_stmt_in_hive("insert into %s.%s select * from 
%s.%s" % (fuzz_db,
 :   fuzz_table, src_db, src_table))
I'm not sure if this copies the data of complextypestbl correctly since I 
previously encountered a name resolution issue for struct fields: 
https://issues.apache.org/jira/browse/IMPALA-9410?focusedCommentId=17060613&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17060613

Could you verify it? If the issue still exists, we can copy data from the 
parquet table.



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 Mar 2020 10:10:37 +
Gerrit-HasComments: Yes