[ 
https://issues.apache.org/jira/browse/HIVE-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25071:
----------------------------------
    Labels: pull-request-available  (was: )

> Number of reducers limited to fixed 1 when updating/deleting
> ------------------------------------------------------------
>
>                 Key: HIVE-25071
>                 URL: https://issues.apache.org/jira/browse/HIVE-25071
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When updating/deleting bucketed tables an extra ReduceSink operator is 
> created to enforce bucketing. After HIVE-22538 number of reducers limited to 
> fixed 1 in these RS operators.
> This can lead to performance degradation.
> Prior HIVE-22538 multiple reducers was available such cases. The reason for 
> limiting the number of reducers is to ensure RowId ascending order in delete 
> delta files produced by the update/delete statements.
> This is the plan of delete statement like:
> {code}
> DELETE FROM t1 WHERE a = 1;
> {code}
> {code}
> TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
> {code}
> RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of 
> reducers were limited to bucket number in the table or 
> hive.exec.reducers.max. However RS[5] does not provide any ordering so above 
> plan may generate unsorted deleted deltas which leads to corrupted data reads.
> Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication 
> and the resulting RS kept the ordering and enabled multiple reducers. It 
> could do because ReduceSinkDeduplication was prepared for ACID writes. This 
> was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to