[
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537534#comment-13537534
]
Kevin Wilfong commented on HIVE-3826:
-------------------------------------
The tests pass.
> Rollbacks and retries of drops cause
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database
> row)
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-3826
> URL: https://issues.apache.org/jira/browse/HIVE-3826
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 0.11
> Reporter: Kevin Wilfong
> Assignee: Kevin Wilfong
> Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database
> row)" from the metastore, but one cause seems to be related to a drop command
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level
> logging, I was seeing the objects that were intended to be dropped remaining
> in the PersistenceManager cache even after a rollback. The steps seemed to
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to
> perform some write operation which produces a commit. This causes those
> detached objects related to the dropped table to attempt to reattach, causing
> JDO to query the SQL backend for those objects which it can't find. This
> causes the exception.
> I was able to reproduce this regularly using the following sequence of
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a
> single thread, I hard coded a RuntimeException into the code to drop a table
> in the ObjectStore, specifically right before the commit in
> preDropStorageDescriptor, to induce a rollback. I also turned off all
> retries at all layers of the metastore.
> Hive client 2 (Hive2): connected to a separate metastore Thrift server
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not
> sure why this was necessary, but it didn't work without it, it seemed to have
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira