[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..

IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID 
schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Reviewed-on: http://gerrit.cloudera.org:8080/15395
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/common/logging.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 

[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 12: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Apr 2020 12:01:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Apr 2020 07:32:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5602/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Apr 2020 07:32:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Apr 2020 23:18:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5679/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Apr 2020 09:38:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-01 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 11: Code-Review+1

(2 comments)

Thanks for the comments!
I'd rather wait for a +2 from someone else than myself.
Carrying +1

http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@112
PS10, Line 112: Analyzer.ensureTableNotTransactional(table_, "ALTER TABLE");
> Is it the time to remove this TODO?
Yes, thanks, done.


http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test:

http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@7
PS10, Line 7: as select * from functional_orc_def.decimal_tiny;
> Will this introduce the same failure in S3 tests like IMPALA-9345 found? Be
We talked about it offline. Conclusion was to add SkipIfS3.hive to this test.



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Apr 2020 09:01:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/11/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/11/testdata/bin/generate-schema-statements.py@321
PS11, Line 321: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/11/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/11/tests/query_test/test_scanners_fuzz.py@306
PS11, Line 306: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Apr 2020 08:57:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-04-01 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#11).

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..

IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID 
schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/common/logging.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 

[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-03-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 10: Code-Review+1

(3 comments)

The other part of the patch looks good to me as well. Just need to verify that 
test_create_table_like_file_orc in test_ddl.py passes in S3 tests since this 
patch adds some data load statements in it.

Feel free to carry on my +1 and Tim's to be +2.

http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15395/10/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@112
PS10, Line 112: // TODO: remove it or keep it by the end of this change 
request.
Is it the time to remove this TODO?


http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test:

http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@7
PS10, Line 7: as select * from functional_orc_def.decimal_tiny;
Will this introduce the same failure in S3 tests like IMPALA-9345 found? 
Because Yarn and HiveServer2 are not launched in S3 tests (IMPALA-9365). Maybe 
we should skip the test on non-hdfs envs as IMPALA-9345 does. Otherwise, do you 
mind running the S3 tests once before merging this patch?


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181
PS8, Line 181:   self.run_stmt_in_hive("insert into %s.%s select * from 
%s.%s" % (fuzz_db,
 :   fuzz_table, src_db, src_table))
> I checked and it seems fine to me. Note that in this CR I changed the data
Thanks for the verification!



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Apr 2020 03:55:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5658/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 Mar 2020 17:13:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-03-31 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 8:

(7 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG@7
PS8, Line 7: IMPALA-9042
> Change this to IMPALA-9484?
Done


http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@194
PS8, Line 194: "hive.acid.version"
> nit: make this a constant?
Done


http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@197
PS8, Line 197: but file "
 : "is not
> nit: explain that it doesn't have metadata "hive.acid.version" = "2"?
Done


http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87
PS8, Line 87:   DCHECK(ValidateFullAcidFileSchema().ok()); // Should have 
already been validated.
> I like the idea, but I run into name clashes because we define KUDU_HEADERS
Thanks for reviewing it so quickly. I added the DCHECK_OK macro to 
common/logging.h.


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py@213
PS8, Line 213:   base_table = 
"functional_orc_def.complextypestbl_non_transactional"
> Do we have test coverage on reading nested types from full-ACID partitioned
Thanks for catching this. I created the ACID version of this test. I think I'll 
keep the current modifications as well to have coverage for non-ACID ORC tables.


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py@207
PS8, Line 207: functional_orc_def
> I think we should get the db name by
Done


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181
PS8, Line 181:   self.run_stmt_in_hive("insert into %s.%s select * from 
%s.%s" % (fuzz_db,
 :   fuzz_table, src_db, src_table))
> I'm not sure if this copies the data of complextypestbl correctly since I p
I checked and it seems fine to me. Note that in this CR I changed the data 
loading of 'complextypestbl'. We don't use the nullable/nonnullable.orc files 
for it (we only use them for complextypestbl_non_transactional).



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 Mar 2020 16:05:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/10/testdata/bin/generate-schema-statements.py@321
PS10, Line 321: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/10/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/10/tests/query_test/test_scanners_fuzz.py@306
PS10, Line 306: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 31 Mar 2020 16:04:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema

2020-03-31 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#10).

Change subject: IMPALA-9484: Full ACID Milestone 1: properly scan files that 
has full ACID schema
..

IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID 
schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/common/logging.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A