[ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754891#comment-16754891
 ] 

ASF GitHub Bot commented on HIVE-21029:
---------------------------------------

GitHub user sankarh opened a pull request:

    https://github.com/apache/hive/pull/523

    HIVE-21029: External table replication for existing deployments running 
incremental replication.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sankarh/hive HIVE-21029

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #523
    
----
commit ccf630904d75a0ff099bc24160efd5d6c03ae02f
Author: Sankar Hariappan <mailtosankarh@...>
Date:   2019-01-29T11:18:47Z

    HIVE-21029: External table replication for existing deployments running 
incremental replication.

----


> External table replication for existing deployments running incremental 
> replication.
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-21029
>                 URL: https://issues.apache.org/jira/browse/HIVE-21029
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.0.0, 3.1.0, 3.1.1
>            Reporter: anishek
>            Assignee: Sankar Hariappan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy <db1> in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to