[
https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289961#comment-14289961
]
Sushanth Sowmyan commented on HIVE-9436:
----------------------------------------
[~thejas]/[~hsubramaniyan] : I have a couple of thoughts about moving
JDOException retries solely to the metastore:
a) Firstly, we have had cases so far where a JDOException invalidates the
connection on the metastore side, and retrying from the metastore has not
helped. Retrying from the client-side, though, causes a fresh openTransaction()
that clears the connection and all history, sometimes by hitting a different
HMSHandler, and this causes the retry from client to be more successful than a
retry from server. Admittedly, this is more likely because we need to clean up
our metastore code to make sure that the retry from the metastore-side handles
this properly, and thus, is something we should attempt to improve.
b) Second, from a perspective of a loaded metastore, having a metastore thread
do retries, thus using up valuable metastore resources/time is more wasteful
than having the client do retries. We thus tend to keep our metastore-side
retries to a low amount, but the fact that we have client-side retries as well
gives us an ability to be fail-fast on the metastore, but retry a large number
of times in particular clients if we find the need to do so. Particularly, in
HA configurations, I've seen a large number of retries and longer
retry-intervals on the client side that allow a connection to go through
despite metastore HUPs.
c) Thirdly, speaking of HA, retrying on the client-side allows us to hit
alternate metastores as well, if configured, if we have scenarios where one
metastore is getting bogged down. As you mention, client should ideally only be
retrying connection exceptions, but JDOExceptions are frequently the result of
connection exceptions raised by the connection pool from the metastore to the
db.
There is definitely scope for refactoring and improvement in all this, I will
look into it further, but for now, this is a simpler bugfix to enable the
already-existing regex to work correctly.
> RetryingMetaStoreClient does not retry JDOExceptions
> ----------------------------------------------------
>
> Key: HIVE-9436
> URL: https://issues.apache.org/jira/browse/HIVE-9436
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0, 0.13.1
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-9436.2.patch, HIVE-9436.patch
>
>
> RetryingMetaStoreClient has a bug in the following bit of code:
> {code}
> } else if ((e.getCause() instanceof MetaException) &&
> e.getCause().getMessage().matches("JDO[a-zA-Z]*Exception")) {
> caughtException = (MetaException) e.getCause();
> } else {
> throw e.getCause();
> }
> {code}
> The bug here is that java String.matches matches the entire string to the
> regex, and thus, that match will fail if the message contains anything before
> or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we
> should match .\*JDO[a-zA-Z]\*Exception.\*
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)