[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Csaba Ringhofer (Code Review) Wed, 19 Jun 2019 05:44:40 -0700

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13559 )


Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 5:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG@36
PS5, Line 36: TODO in following commits:
It is not clear to me whether dynamic partitioning is supported or not.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@214
PS5, Line 214:   // Create final_hdfs_file_name_prefix and 
tmp_hdfs_file_name_prefix.
Can you mention the transactional naming logic in the comment?


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@696
PS5, Line 696:   return 
(IsS3APath(partition->final_hdfs_file_name_prefix.c_str()) && !overwrite_ &&
             :       state->query_options().s3_skip_insert_staging) ||
             :       IsTransactional();
nit: can you improve readability? e.g if (IsTransactional()) return true; could 
go to a separate line.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817
PS5, Line 817:     if (exec_request().__isset.transaction_id) {
             :       
RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id));
             :       in_transaction_ = false;
             :     }
             :     RETURN_IF_ERROR(UpdateCatalog());
I am not sure here, but the order of UpdateCatalog() and CommitTransaction() 
seems non-trivial to me, especially with dynamic partitioning - I assume that 
UpdateCatalog() will create the new partitions in HMS, and in that case we can 
commit to partitions that do not exist yet in HMS.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1259
PS5, Line 1259:     try {
I think it would be more readable if the logic inside the try block would be in 
a separate function.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1370
PS5, Line 1370: ImpalaException
Should this only catch ImpalaException? e.g. if there is a null pointer 
exception for some reason, we should still abort the transaction in my opinion.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1372
PS5, Line 1372: abortTransaction
This can also throw an exception, for example if HMS was shut down, and this 
will hide the original exception.

I don't know how do we generally handle this in Impala - in this case, 
TransactionException could get a member like Exception causeOfAbort_ or 
originalException_.


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql@2611
PS5, Line 2611: TBLPROPERTIES('transactional'='true', 
'transactional_properties'='insert_only');;
nit: extra ;


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@7
PS5, Line 7: ---- QUERY
It would be great to also read the tables back with HMS.


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@113
PS5, Line 113: ====
Can you also add insert into/overwrite with dynamic partitioning?


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py@662
PS5, Line 662: execute_serially
As this test runs in a unique_database, I don't think that it needs to run 
serially. E.g. the next test needs this because it has INVALIDATE METADATA, 
which is a global operation.


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@142
PS5, Line 142:   def test_acid_insert(self, vector):
What is the reason for not using a unique database?


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@146
PS5, Line 146:     if HIVE_MAJOR_VERSION >= 3:
@SkipIfHive2.acid already ensures that HIVE_MAJOR_VERSION >= 3.


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@147
PS5, Line 147:       # This client doesn't have the capability for ACID tables, 
but we only need
             :       # to drop and create such tables, and table properties are 
preserved during
             :       # those operations.
             :       capability_check = 
self.hive_client.getMetaConf("metastore.client.capability.check")
             :       
self.hive_client.setMetaConf("metastore.client.capability.check", "false")
I do not understand why this is needed.



--
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Jun 2019 12:44:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Reply via email to