[ 
https://issues.apache.org/jira/browse/HIVE-22336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954935#comment-16954935
 ] 

Dinesh Chitlangia commented on HIVE-22336:
------------------------------------------

[~kuczoram] Thanks for filing this patch. Latest patch looks clean.

> The updates should be pushed to the Metastore backend DB before creating the 
> notification event
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22336
>                 URL: https://issues.apache.org/jira/browse/HIVE-22336
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>         Attachments: HIVE-22336.1.patch, HIVE-22336.2.patch, 
> HIVE-22336.3.patch
>
>
> There was an issue on HDP-3.1 where a table couldn't be deleted, because some 
> related objects (like storage descriptor) were missing from the metastore. 
> There was a previous delete attempt on that table which went wrong, but no 
> rollback happened, that's why the SD were missing. In that previous delete, 
> the notification creation swallowed the error which came from the backend DB, 
> that's why no rollback happened. Here are the steps which happened in the 
> first delete attempt:
>  
> # Open a transaction (transaction_1) - this step was successful
> # Delete all the objects which are related to the table - this step was 
> successful too, so the SD and other objects were deleted
> # Delete the table - this step failed in the backend DB, but according to the 
> log the delete happens in a batch statement, so it won't necessarily be 
> executed right at this moment, so we won't see an error here
> # Create a notification about the table delete:
> ## Open an other transaction for the notification creation (transaction_2) - 
> call the ObjectStore.openTransaction method which increases a counter for 
> open transactions and then checks if there is already an active transaction. 
> If there is, then just returns true and doesn't really create a new 
> transaction.
> ## Lock the notification id in the metastore backend db for update - here is 
> where the exception from the backend DB (let's call it "MySQL Exception") 
> manifests
> ## If an exception occurs during acquiring the log, retry - The "MySQL 
> Exception" was caught and since there is no check on the exception, the retry 
> mechanism thinks that it happened because couldn't acquire the log for the 
> notification id, so retries and "forgot" about the "MySQL Exception".
> ## If the lock was acquired successfully, create the notification - Second 
> time, the lock was acquired successfully, so the notification creation was 
> successful.
> ## Commit transaction_2 - Just decrease the transaction counter, but doesn't 
> actually commits anything.
> # Commit transaction_1 - This commits the transaction, but since the error 
> already got manifested and kind of "handled", here we won't see any error, 
> just that the commit was successful, so no rollback happens and leaves the 
> table object in an invalid state.
> # If the commit was not successful then rollback
> In the customer setup, this issue could be fixed by adding a flush call 
> before creating the notification event, so all the updates would be pushed to 
> the backend db and the error would manifest at this point. With this, the 
> error would go back to the HiveMetastore class which would do the rollback 
> and the delete table operation would fail as it should be, since the table 
> couldn't be deleted. But then the Hivemetastore retry mechanism could try the 
> table deletion again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to