[ 
https://issues.apache.org/jira/browse/HIVE-21771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21771:
---------------------------------------
    Status: Patch Available  (was: Open)

> Support partition filter (where clause) in REPL dump command (Bootstrap Dump)
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-21771
>                 URL: https://issues.apache.org/jira/browse/HIVE-21771
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21771.01.patch, HIVE-21771.02.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Bootstrap for managed table*
> User should be allowed to execute REPL DUMP with where clause. The where 
> clause should support filtering out partition from dump. Format of the where 
> clause should be similar to *"REPL DUMP dbname from 10 where 't0' where key < 
> 10,'t1'* where key = 3, '(t2*)|'t3' where key > 3".* For initial version, 
> very basic filter condition will be supported and later the complexity will 
> be increased as and when required.
>  * From the AST generated for the where clause, extract the table information.
>  * Generate AST for each table.
>  * List the partition for each table using the AST generated for each table 
> using the   same metastore API used by select query.
>  * During bootstrap load use the partition list to dump the partitions.
>  * During incremental dump, use the list to filter out the event.
> In case of bootstrap load, all the tables of the database will be scanned and
>  * If table is not partitioned, then it will be dumped.
>  * If key provided in the filter condition for the table is not a partition 
> column, then dump will fail.
>  * If table is not mentioned in the where clause, then all partitions of the 
> table will be dumped.
>  * All the partitioned of the table satisfying the where clause will be 
> dumped.
> *Incremental for managed table (Not part of this patch)*
> In case of Incremental Dump, the events from the notification log will be 
> scanned and once the partition spec is extracted from the event, the 
> partition spec will be filtered against the condition.
>  * If table is not partitioned then the event will be added to the dump.
>  * If key mentioned is not a partition column, then dump will fail.
>  * If the table is not mentioned in the filter then event will be added to 
> the dump.
>  * If the event is multi partitioned, then the event will be added to the 
> dump. (Filtering out redundant partitions from message will be done as part 
> of separate task).
>  * If the partition spec matches the filter, then the event will be added to 
> the dump*.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to