Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )
Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables ...................................................................... Patch Set 5: (14 comments) http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG@36 PS5, Line 36: TODO in following commits: It is not clear to me whether dynamic partitioning is supported or not. http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@214 PS5, Line 214: // Create final_hdfs_file_name_prefix and tmp_hdfs_file_name_prefix. Can you mention the transactional naming logic in the comment? http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@696 PS5, Line 696: return (IsS3APath(partition->final_hdfs_file_name_prefix.c_str()) && !overwrite_ && : state->query_options().s3_skip_insert_staging) || : IsTransactional(); nit: can you improve readability? e.g if (IsTransactional()) return true; could go to a separate line. http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817 PS5, Line 817: if (exec_request().__isset.transaction_id) { : RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id)); : in_transaction_ = false; : } : RETURN_IF_ERROR(UpdateCatalog()); I am not sure here, but the order of UpdateCatalog() and CommitTransaction() seems non-trivial to me, especially with dynamic partitioning - I assume that UpdateCatalog() will create the new partitions in HMS, and in that case we can commit to partitions that do not exist yet in HMS. http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1259 PS5, Line 1259: try { I think it would be more readable if the logic inside the try block would be in a separate function. http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1370 PS5, Line 1370: ImpalaException Should this only catch ImpalaException? e.g. if there is a null pointer exception for some reason, we should still abort the transaction in my opinion. http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1372 PS5, Line 1372: abortTransaction This can also throw an exception, for example if HMS was shut down, and this will hide the original exception. I don't know how do we generally handle this in Impala - in this case, TransactionException could get a member like Exception causeOfAbort_ or originalException_. http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql@2611 PS5, Line 2611: TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');; nit: extra ; http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test: http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@7 PS5, Line 7: ---- QUERY It would be great to also read the tables back with HMS. http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@113 PS5, Line 113: ==== Can you also add insert into/overwrite with dynamic partitioning? http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py File tests/metadata/test_hms_integration.py: http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py@662 PS5, Line 662: execute_serially As this test runs in a unique_database, I don't think that it needs to run serially. E.g. the next test needs this because it has INVALIDATE METADATA, which is a global operation. http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py File tests/query_test/test_insert.py: http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@142 PS5, Line 142: def test_acid_insert(self, vector): What is the reason for not using a unique database? http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@146 PS5, Line 146: if HIVE_MAJOR_VERSION >= 3: @SkipIfHive2.acid already ensures that HIVE_MAJOR_VERSION >= 3. http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@147 PS5, Line 147: # This client doesn't have the capability for ACID tables, but we only need : # to drop and create such tables, and table properties are preserved during : # those operations. : capability_check = self.hive_client.getMetaConf("metastore.client.capability.check") : self.hive_client.setMetaConf("metastore.client.capability.check", "false") I do not understand why this is needed. -- To view, visit http://gerrit.cloudera.org:8080/13559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad Gerrit-Change-Number: 13559 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Wed, 19 Jun 2019 12:44:18 +0000 Gerrit-HasComments: Yes