[ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3826:
--------------------------------

    Description: 
I'm not sure if this is the only cause of the exception 
"org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row)" from the metastore, but one cause seems to be related to a drop command 
failing, and being retried by the client.

Based on focusing on a single thread in the metastore with DEBUG level logging, 
I was seeing the objects that were intended to be dropped remaining in the 
PersistenceManager cache even after a rollback.  The steps seemed to be as 
follows:

1) First attempt to drop the table, the table is pulled into the 
PersistenceManager cache for the purposes of dropping
2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
causes a rollback of the transaction
3) The drop is retried using a different thread on the metastore Thrift server 
or a different server and succeeds
4) Back on the original thread of the original Thrift server someone tries to 
perform some write operation which produces a commit.  This causes those 
detached objects related to the dropped table to attempt to reattach, causing 
JDO to query the SQL backend for those objects which it can't find.  This 
causes the exception.

I was able to reproduce this regularly using the following sequence of commands:
Hive client 1 (Hive1): connected to a metastore Thrift server running a single 
thread, I hard coded a RuntimeException into the code to drop a table in the 
ObjectStore, specifically right before the commit in preDropStorageDescriptor, 
to induce a rollback.  I also turned off all retries at all layers of the 
metastore.
Hive client 2 (Hive2): connected to a separate metastore Thrift server running 
with standard configs and code

1: On Hive1, CREATE TABLE t1 (c STRING);
2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
3: On Hive2, DROP TABLE t1; // Succeeds
4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not sure 
why this was necessary, but it didn't work without it, it seemed to have an 
affect on the order objects were committed in the next step
5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
with the NucleusObjectNotFoundException

The object that would cause the exception varied, I saw the MTable, the 
MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

  was:
I'm not sure if this is the only cause of the exception 
"org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row)" from the metastore, but one cause seems to be related to a drop command 
failing, and being retried by the client.

Based on focusing on a single thread in the metastore with DEBUG level logging, 
I was seeing the objects that were intended to be dropped remaining in the 
PersistenceManager cache even after a rollback.  The steps seemed to be as 
follows:

1) First attempt to drop the table, the table is pulled into the 
PersistenceManager cache for the purposes of dropping
2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
causes a rollback of the transaction
3) The drop is retried using a different thread on the metastore Thrift server 
or a different server and succeeds
4) Back on the original thread of the original Thrift server someone tries to 
perform some write operation which produces a commit.  This causes those 
detached objects related to the dropped table to attempt to reattach, causing 
JDO to query the SQL backend for those objects which it can't find.  This 
causes the exception.

I was able to reproduce this regularly using the following sequence of commands:
Hive client 1 (Hive1): connected to a metastore Thrift server running a single 
thread, I hard coded a RuntimeException into the code to drop a table in the 
ObjectStore, specifically right before the commit in preDropStorageDescriptor, 
to induce a rollback
Hive client 2 (Hive2): connected to a separate metastore Thrift server running 
with standard configs and code

1: On Hive1, CREATE TABLE t1 (c STRING);
2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
3: On Hive2, DROP TABLE t1; // Succeeds
4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not sure 
why this was necessary, but it didn't work without it, it seemed to have an 
affect on the order objects were committed in the next step
5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
with the NucleusObjectNotFoundException

The object that would cause the exception varied, I saw the MTable, the 
MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

    
> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3826
>                 URL: https://issues.apache.org/jira/browse/HIVE-3826
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.11
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback.  I also turned off all 
> retries at all layers of the metastore.
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to