Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817
PS5, Line 817:     RETURN_IF_ERROR(GetCoordinator()->Wait());
             :     if (exec_request().__isset.transaction_id) {
             :       
RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id));
             :       in_transaction_ = false;
             :     }
> UpdateCatalog() won't load the new write ids and won't see the new files un
I am leaning towards doing UpdateCatalog() first:
- UpdateCatalog() seems a more complex operation, so there is probably a bigger 
chance for failure.
- If UpdateCatalog() succeeds but the commit fails, then the only side effect 
can be the creation of some empty partitions (insert overwrite doesn't delete 
partitions, as we checked together), which does not change the contents of the 
table. On the other hand, commit can add files to both old and new partitions, 
so if adding the new partitions fails, then the insert becomes "partially 
successful", which is not ok in the ACID world.
- It seems logical to do "commit" as late as possible, when there are not a lot 
of things that can go wrong. Getting an error for your INSERT query but still 
committing rows is weird.


http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@722
PS6, Line 722:     in_transaction_ = false;
Can you add query_events_->MarkEvent("Transaction aborted");?


http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@820
PS6, Line 820:       in_transaction_ = false;
Can you add query_events_->MarkEvent("Transaction committed");?


http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift
File common/thrift/DataSinks.thrift:

http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift@81
PS6, Line 81: write_id
Here and other similar variables: I would consider adding "acid_" or 
"transactional_" to make these variables' role clearer to people who don't know 
Hive acid terminology.


http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@677
PS6, Line 677:    * Parses and plans a query in order to generate its explain 
string. This method does
I wonder how to handle EXPLAIN - the backend won't be there to commit/abort the 
query. It would be probably the best to avoid opening a transaction/getting new 
write IDs when executing EXPLAIN.


http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1257
PS6, Line 1257:
Can you add timeline.markEvent("Transaction opened");?

Adding transaction related events to the timeline will make it clear from the 
profile whether the query was treated as  transactional or not.

On the other hand, if we are doing this before analyzeAndAuthorize(), then we 
give a user (without the right to access a transactional table) the possibility 
to "probe" for the table's existence, as the profile will be different after 
select * from tblname if tblname exists and is transactional.


http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java@1378
PS7, Line 1378:           abortTransaction(queryCtx.getTransaction_id());
This may be out of the scope of this patch, but I would consider making 
abortTransaction() async - so another thread would do the abort calls, and we 
would just add a transaction ID to its queue here.

The reason is that doing abort a bit later doesn't seem problematic to me, 
while adding further delay to a failed query can be frustrating + it would 
allow us to abort a query later even if HMS is unresponsive at the moment.



--
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Jun 2019 10:21:20 +0000
Gerrit-HasComments: Yes

Reply via email to