Vihang Karajgaonkar created HIVE-16452:
------------------------------------------

             Summary: Database UUID for metastore DB
                 Key: HIVE-16452
                 URL: https://issues.apache.org/jira/browse/HIVE-16452
             Project: Hive
          Issue Type: New Feature
          Components: Metastore
            Reporter: Vihang Karajgaonkar
            Assignee: Vihang Karajgaonkar


In cloud environments it is possible that a same database instance is used as 
the long running metadata persistence layer and multiple HMS access this 
database. These HMS instances could be running the same time or in case of 
transient workloads come up on an on-demand basis. HMS is used by multiple 
projects in the Hadoop eco-system as the de-facto metadata keeper for various 
SQL engines on the cluster. Currently, there is no way to uniquely identify the 
database instance which is backing the HMS. For example, if there are two 
instances of HMS running on top of same metastore DB, there is no way to 
identify that data received from both the metastore clients is coming from the 
same database. Similarly, if there in case of transient workloads multiple HMS 
services come up and go, a external application which is fetching data from a 
HMS has no way to identify that these multiple instances of HMS are in fact 
returning the same data. 

We can potentially use the combination of javax.jdo.option.ConnectionURL, 
javax.jdo.option.ConnectionDriverName configuration of each HMS instance but 
this is approach may not be very robust. If the database is migrated to another 
server for some reason the ConnectionURL can change. Having a UUID in the 
metastore DB which can be queried using a Thrift API can help solve this 
problem. This way any application talking to multiple HMS instances can 
recognize if the data is coming the same backing database.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to