[ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:
------------------------------------
    Status: Patch Available  (was: Open)

Attached 04.patch with fix for review comments from Mahesh.

> External table replication for existing deployments running incremental 
> replication.
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-21029
>                 URL: https://issues.apache.org/jira/browse/HIVE-21029
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 3.1.1, 3.1.0, 3.0.0
>            Reporter: anishek
>            Assignee: Sankar Hariappan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21029.01.patch, HIVE-21029.02.patch, 
> HIVE-21029.03.patch, HIVE-21029.04.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy <db1> in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to