[ 
https://issues.apache.org/jira/browse/HIVE-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-7973:
--------------------------------------

    Assignee: Sushanth Sowmyan  (was: Shannon Ladymon)

> Hive Replication Support
> ------------------------
>
>                 Key: HIVE-7973
>                 URL: https://issues.apache.org/jira/browse/HIVE-7973
>             Project: Hive
>          Issue Type: Bug
>          Components: Import/Export
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> A need for replication is a common one in many database management systems, 
> and it's important for hive to evolve support for such a tool as part of its 
> ecosystem. Hive already supports an EXPORT and IMPORT command, which can be 
> used to dump out tables, distcp them to another cluster, and and 
> import/create from that. If we had a mechanism by which exports and imports 
> could be automated, it establishes the base with which replication can be 
> developed.
> One place where this kind of automation can be developed is with aid of the 
> HiveMetaStoreEventHandler mechanisms, to generate notifications when certain 
> changes are committed to the metastore, and then translate those 
> notifications to export actions, distcp actions and import actions on another 
> import action.
> Part of that already exists is with the Notification system that is part of 
> hcatalog-server-extensions. Initially, this was developed to be able to 
> trigger a JMS notification, which an Oozie workflow can use to can start off 
> actions keyed on the finishing of a job that used HCatalog to write to a 
> table. While this currently lives under hcatalog, the primary reason for its 
> existence has a scope well past hcatalog alone, and can be used as-is without 
> the use of HCatalog IF/OF. This can be extended, with the help of a library 
> which does that aforementioned translation. I also think that these sections 
> should live in a core hive module, rather than being tucked away inside 
> hcatalog.
> Once we have rudimentary support for table & partition replication, we can 
> then move on to further requirements of replication, such as metadata 
> replications (such as replication of changes to roles/etc), and/or optimize 
> away the requirement to distcp and use webhdfs instead, etc.
> This Story tracks all the bits that go into development of such a system - 
> I'll create multiple smaller tasks inside this as we go on.
> Please also see HIVE-10264 for documentation-related links for this, and 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment 
> for associated wiki (currently in progress)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to