Paul Rogers created DRILL-5510: ---------------------------------- Summary: Revisit connection failure recovery in Hive storage plugin Key: DRILL-5510 URL: https://issues.apache.org/jira/browse/DRILL-5510 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.11.0 Reporter: Paul Rogers
DRILL-5496 describes a problem which occurs when the Hive metastore server is restarted while Drill runs. The solution in that ticket is a work-around: we discard all cached Hive metastore data and rebuild the metadata cache. The original code tried to be more subtle: detecting that the connection has failed, reconnect, but preserve the cache. DRILL-5496 describes the flaws in that approach for the secure connection case. This ticket asks to spend the time to understand the Hive metadata code and restructure it to preserve the cache across connection failures. Note a subtle issue: if the Hive metastore goes down, when it comes back up, it may contain different data; anything could happen while the server is down: upgrade schemas, replace one schema with another, etc. So, the caching mechanism, if it is to preserve data across reconnects, must handle such changes. Of course, such changes could occur even within a single connection, so the code should handle such cases already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)