Paul Rogers created DRILL-5510:
----------------------------------

             Summary: Revisit connection failure recovery in Hive storage plugin
                 Key: DRILL-5510
                 URL: https://issues.apache.org/jira/browse/DRILL-5510
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.11.0
            Reporter: Paul Rogers


DRILL-5496 describes a problem which occurs when the Hive metastore server is 
restarted while Drill runs. The solution in that ticket is a work-around: we 
discard all cached Hive metastore data and rebuild the metadata cache.

The original code tried to be more subtle: detecting that the connection has 
failed, reconnect, but preserve the cache. DRILL-5496 describes the flaws in 
that approach for the secure connection case.

This ticket asks to spend the time to understand the Hive metadata code and 
restructure it to preserve the cache across connection failures.

Note a subtle issue: if the Hive metastore goes down, when it comes back up, it 
may contain different data; anything could happen while the server is down: 
upgrade schemas, replace one schema with another, etc. So, the caching 
mechanism, if it is to preserve data across reconnects, must handle such 
changes.

Of course, such changes could occur even within a single connection, so the 
code should handle such cases already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to