-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17678/
-----------------------------------------------------------
(Updated Feb. 11, 2014, 2:22 a.m.)
Review request for hive.
Changes
-------
Getting rid of the first retry to create the default database, on metastore
startup. Now, the RawStoreProxy is just a rename of RetryingRawStore + getting
rid of retry at that layer.
Bugs: HIVE-4996
https://issues.apache.org/jira/browse/HIVE-4996
Repository: hive-git
Description
-------
Background:
First issue:
There are two levels of retrying in case of transient JDO/CP/DB errors: the
RetryingHMSHandler and RetryingRawStore. But the RetryingRawStore is flawed in
the case of a nested transaction of a larger RetryingHMSHandler transaction
(which is majority of cases).
Consider the following sample RetryingHMSHandler call, where variable "ms" is a
RetryingRawStore.
HMSHandler.createTable()
ms.open() //openTx = 1
ms.getTable() // openTx = 2, then openTx = 1 upon intermediate commit
ms.createTable() //openTx = 2, then openTx = 1 upon intermediate commit
ms.commit(); //openTx = 0
If there is any transient error in any intermediate operation and
RetryingRawStore tries again, there will always be an unbalanced transaction,
like:
HMSHandler.createTable()
ms.open() //openTx = 1
ms.getTable() // openTx = 2, transient error, then openTx=0 upon
rollback. After a retry, openTx=1, then openTx=0 upon successful intermediate
commit
ms.createTable() //openTx = 1, then openTx = 0 upon intermediate commit
ms.commit(); //unbalanced transaction!
Retrying RawStore operations doesn't make sense in nested transaction cases, as
the first part of the transaction is rolled-back upon transient error, and
retry logic only saves a second half which may not make sense without the
first. It makes much more sense to retry the entire transaction from the top,
which is what RetryingHMSHandler would already be doing if the RetryingRawStore
did not interfere.
Second issue:
The recent upgrade to BoneCP 0.8.0 seemed to cause more transient errors that
triggered this problem. In these cases, in-use connections are finalized, as
follows:
WARN bonecp.ConnectionPartition
(ConnectionPartition.java:finalizeReferent(162)) - BoneCP detected an unclosed
connection and will now attempt to close it for you. You should be closing this
connection in your application - enable connectionWatch for additional
debugging assistance or set disableConnectionTracking to true to disable this
feature entirely.
The retry of this operation seems to get a good connection and allow the
operation to proceed. Reading forums, it seems some others have hit this issue
after the upgrade, and switching back to 0.7.1 in our environment eliminated
this issue for us. But that reversion is outside the scope of this JIRA, and
would be better-done in either the original or follow-up JIRA that upgraded the
version.
This fix targets the first issue only, as anyway it is needed for any sort of
transient error, not just the BoneCP one that I observed.
Changes:
1. Removes RetryingRawStore in favor of RetryingHMSHandler, and removes the
configuration property of the former.
2. Addresses the resultant holes in retry, in particular in the
RetryingHMSHandler's construction of RawStore (before, RetryingRawStore would
have retried failures like in creating the defaultDB). It didn't seem
necessary to increase the default RetryingHMSHandler retries to 2 to
compensate, but I am open to that as well.
3. Contribute the instrumentation code that helped me to find the issue. This
includes printing missing stacks of exceptions that triggered retry, including
'unbalanced calls' errors to hive log, and adding debug-level tracing of
ObjectStore calls to give better correlation with other errors/warnings in hive
log.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ec1f09e
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
d7854fe
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRawStoreTxn.java
0b87077
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
4905be0
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 0715e22
metastore/src/java/org/apache/hadoop/hive/metastore/RawStoreProxy.java
PRE-CREATION
metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
fb70589
metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java
dcf97ec
Diff: https://reviews.apache.org/r/17678/diff/
Testing
-------
Thanks,
Szehon Ho