Paul Rogers created DRILL-5510:
----------------------------------
Summary: Revisit connection failure recovery in Hive storage plugin
Key: DRILL-5510
URL: https://issues.apache.org/jira/browse/DRILL-5510
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.11.0
Reporter: Paul Rogers
DRILL-5496 describes a problem which occurs when the Hive metastore server is
restarted while Drill runs. The solution in that ticket is a work-around: we
discard all cached Hive metastore data and rebuild the metadata cache.
The original code tried to be more subtle: detecting that the connection has
failed, reconnect, but preserve the cache. DRILL-5496 describes the flaws in
that approach for the secure connection case.
This ticket asks to spend the time to understand the Hive metadata code and
restructure it to preserve the cache across connection failures.
Note a subtle issue: if the Hive metastore goes down, when it comes back up, it
may contain different data; anything could happen while the server is down:
upgrade schemas, replace one schema with another, etc. So, the caching
mechanism, if it is to preserve data across reconnects, must handle such
changes.
Of course, such changes could occur even within a single connection, so the
code should handle such cases already.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)