Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )
Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables ...................................................................... Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817 PS5, Line 817: RETURN_IF_ERROR(GetCoordinator()->Wait()); : if (exec_request().__isset.transaction_id) { : RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id)); : in_transaction_ = false; : } > UpdateCatalog() won't load the new write ids and won't see the new files un I am leaning towards doing UpdateCatalog() first: - UpdateCatalog() seems a more complex operation, so there is probably a bigger chance for failure. - If UpdateCatalog() succeeds but the commit fails, then the only side effect can be the creation of some empty partitions (insert overwrite doesn't delete partitions, as we checked together), which does not change the contents of the table. On the other hand, commit can add files to both old and new partitions, so if adding the new partitions fails, then the insert becomes "partially successful", which is not ok in the ACID world. - It seems logical to do "commit" as late as possible, when there are not a lot of things that can go wrong. Getting an error for your INSERT query but still committing rows is weird. http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@722 PS6, Line 722: in_transaction_ = false; Can you add query_events_->MarkEvent("Transaction aborted");? http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@820 PS6, Line 820: in_transaction_ = false; Can you add query_events_->MarkEvent("Transaction committed");? http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift File common/thrift/DataSinks.thrift: http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift@81 PS6, Line 81: write_id Here and other similar variables: I would consider adding "acid_" or "transactional_" to make these variables' role clearer to people who don't know Hive acid terminology. http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@677 PS6, Line 677: * Parses and plans a query in order to generate its explain string. This method does I wonder how to handle EXPLAIN - the backend won't be there to commit/abort the query. It would be probably the best to avoid opening a transaction/getting new write IDs when executing EXPLAIN. http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1257 PS6, Line 1257: Can you add timeline.markEvent("Transaction opened");? Adding transaction related events to the timeline will make it clear from the profile whether the query was treated as transactional or not. On the other hand, if we are doing this before analyzeAndAuthorize(), then we give a user (without the right to access a transactional table) the possibility to "probe" for the table's existence, as the profile will be different after select * from tblname if tblname exists and is transactional. http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java@1378 PS7, Line 1378: abortTransaction(queryCtx.getTransaction_id()); This may be out of the scope of this patch, but I would consider making abortTransaction() async - so another thread would do the abort calls, and we would just add a transaction ID to its queue here. The reason is that doing abort a bit later doesn't seem problematic to me, while adding further delay to a failed query can be frustrating + it would allow us to abort a query later even if HMS is unresponsive at the moment. -- To view, visit http://gerrit.cloudera.org:8080/13559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad Gerrit-Change-Number: 13559 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 24 Jun 2019 10:21:20 +0000 Gerrit-HasComments: Yes