[ https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan updated HIVE-21029: ------------------------------------ Status: Patch Available (was: Open) Attached 04.patch with fix for review comments from Mahesh. > External table replication for existing deployments running incremental > replication. > ------------------------------------------------------------------------------------ > > Key: HIVE-21029 > URL: https://issues.apache.org/jira/browse/HIVE-21029 > Project: Hive > Issue Type: Bug > Components: repl > Affects Versions: 3.1.1, 3.1.0, 3.0.0 > Reporter: anishek > Assignee: Sankar Hariappan > Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21029.01.patch, HIVE-21029.02.patch, > HIVE-21029.03.patch, HIVE-21029.04.patch > > > Existing deployments using hive replication do not get external tables > replicated. For such deployments to enable external table replication they > will have to provide a specific switch to first bootstrap external tables as > part of hive incremental replication, following which the incremental > replication will take care of further changes in external tables. > The switch will be provided by an additional hive configuration (for ex: > hive.repl.bootstrap.external.tables) and is to be used in > {code} WITH {code} clause of > {code} REPL DUMP {code} command. > Additionally the existing hive config _hive.repl.include.external.tables_ > will always have to be set to "true" in the above clause. > Proposed usage for enabling external tables replication on existing > replication policy. > 1. Consider an ongoing repl policy <db1> in incremental phase. > Enable hive.repl.include.external.tables=true and > hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP. > - Dumps all events but skips events related to external tables. > - Instead, combine bootstrap dump for all external tables under “_bootstrap” > directory. > - Also, includes the data locations file "_external_tables_info”. > - LIMIT or TO clause shouldn’t be there to ensure the latest events are > dumped before bootstrap dumping external tables. > 2. REPL LOAD on this dump applies all the events first, copies external > tables data and then bootstrap external tables (metadata). > - It is possible that the external tables (metadata) are not point-in time > consistent with rest of the tables. > - But, it would be eventually consistent when the next incremental load is > applied. > - This REPL LOAD is fault tolerant and can be retried if failed. > 3. All future REPL DUMPs on this repl policy should set > hive.repl.bootstrap.external.tables=false. > - If not set to false, then target might end up having inconsistent set of > external tables as bootstrap wouldn’t clean-up any dropped external tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)