[jira] [Updated] (HIVE-21771) Support partition filter (where clause) in REPL dump command

mahesh kumar behera (JIRA) Tue, 16 Jul 2019 21:56:13 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


mahesh kumar behera updated HIVE-21771:
---------------------------------------
    Description: 
*Bootstrap for managed table*

User should be allowed to execute REPL DUMP with where clause. The where clause 
should support filtering out partition from dump. Format of the where clause 
should be similar to *"REPL DUMP dbname from 10 where 't0' where key < 10,'t1'* 
where key = 3, '(t2*)|'t3' where key > 3".* For initial version, very basic 
filter condition will be supported and later the complexity will be increased 
as and when required.
 * From the AST generated for the where clause, extract the table information.
 * Generate AST for each table.
 * List the partition for each table using the AST generated for each table 
using the   same metastore API used by select query.
 * During bootstrap load use the partition list to dump the partitions.
 * During incremental dump, use the list to filter out the event.

In case of bootstrap load, all the tables of the database will be scanned and
 * If table is not partitioned, then it will be dumped.
 * If key provided in the filter condition for the table is not a partition 
column, then dump will fail.
 * If table is not mentioned in the where clause, then all partitions of the 
table will be dumped.
 * All the partitioned of the table satisfying the where clause will be dumped.

*Incremental for managed table (Not part of this patch)*

In case of Incremental Dump, the events from the notification log will be 
scanned and once the partition spec is extracted from the event, the partition 
spec will be filtered against the condition.
 * If table is not partitioned then the event will be added to the dump.
 * If key mentioned is not a partition column, then dump will fail.
 * If the table is not mentioned in the filter then event will be added to the 
dump.
 * If the event is multi partitioned, then the event will be added to the dump. 
(Filtering out redundant partitions from message will be done as part of 
separate task).
 * If the partition spec matches the filter, then the event will be added to 
the dump*.*

 

  was:
*Bootstrap for managed table*

User should be allowed to execute REPL DUMP with where clause. The where clause 
should support filtering out partition from dump. Format of the where clause 
should be similar to *"REPL DUMP dbname from 10 where 't0' where key < 10,'t1'* 
where key = 3, '(t2*)|'t3' where key > 3".* For initial version, very basic 
filter condition will be supported and later the complexity will be increased 
as and when required.
 * From the AST generated for the where clause, extract the table information.
 * Generate AST for each table.
 * List the partition for each table using the AST generated for each table 
using the   same metastore API used by select query.
 * During bootstrap load use the partition list to dump the partitions.
 * During incremental dump, use the list to filter out the event.

In case of bootstrap load, all the tables of the database will be scanned and
 * If table is not partitioned, then it will be dumped.
 * If key provided in the filter condition for the table is not a partition 
column, then dump will fail.
 * If table is not mentioned in the where clause, then all partitions of the 
table will be dumped.
 * All the partitioned of the table satisfying the where clause will be dumped.

*Incremental for managed table*

In case of Incremental Dump, the events from the notification log will be 
scanned and once the partition spec is extracted from the event, the partition 
spec will be filtered against the condition.
 * If table is not partitioned then the event will be added to the dump.
 * If key mentioned is not a partition column, then dump will fail.
 * If the table is not mentioned in the filter then event will be added to the 
dump.
 * If the event is multi partitioned, then the event will be added to the dump. 
(Filtering out redundant partitions from message will be done as part of 
separate task).
 * If the partition spec matches the filter, then the event will be added to 
the dump*.*

 


> Support partition filter (where clause) in REPL dump command
> ------------------------------------------------------------
>
>                 Key: HIVE-21771
>                 URL: https://issues.apache.org/jira/browse/HIVE-21771
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>             Fix For: 4.0.0
>
>
> *Bootstrap for managed table*
> User should be allowed to execute REPL DUMP with where clause. The where 
> clause should support filtering out partition from dump. Format of the where 
> clause should be similar to *"REPL DUMP dbname from 10 where 't0' where key < 
> 10,'t1'* where key = 3, '(t2*)|'t3' where key > 3".* For initial version, 
> very basic filter condition will be supported and later the complexity will 
> be increased as and when required.
>  * From the AST generated for the where clause, extract the table information.
>  * Generate AST for each table.
>  * List the partition for each table using the AST generated for each table 
> using the   same metastore API used by select query.
>  * During bootstrap load use the partition list to dump the partitions.
>  * During incremental dump, use the list to filter out the event.
> In case of bootstrap load, all the tables of the database will be scanned and
>  * If table is not partitioned, then it will be dumped.
>  * If key provided in the filter condition for the table is not a partition 
> column, then dump will fail.
>  * If table is not mentioned in the where clause, then all partitions of the 
> table will be dumped.
>  * All the partitioned of the table satisfying the where clause will be 
> dumped.
> *Incremental for managed table (Not part of this patch)*
> In case of Incremental Dump, the events from the notification log will be 
> scanned and once the partition spec is extracted from the event, the 
> partition spec will be filtered against the condition.
>  * If table is not partitioned then the event will be added to the dump.
>  * If key mentioned is not a partition column, then dump will fail.
>  * If the table is not mentioned in the filter then event will be added to 
> the dump.
>  * If the event is multi partitioned, then the event will be added to the 
> dump. (Filtering out redundant partitions from message will be done as part 
> of separate task).
>  * If the partition spec matches the filter, then the event will be added to 
> the dump*.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-21771) Support partition filter (where clause) in REPL dump command

Reply via email to