Krisztian Kasa created HIVE-25071:
-------------------------------------

             Summary: Number of reducers limited to fixed 1 when 
updating/deleting
                 Key: HIVE-25071
                 URL: https://issues.apache.org/jira/browse/HIVE-25071
             Project: Hive
          Issue Type: Bug
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa


When updating/deleting bucketed tables an extra ReduceSink operator is created 
to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in 
these RS operators.

This can lead to performance degradation.

Prior HIVE-22538 multiple reducers was available such cases. The reason for 
limiting the number of reducers is to ensure RowId ascending order in delete 
delta files produced by the update/delete statements.

This is the plan of delete statement like:

{code}
DELETE FROM t1 WHERE a = 1;
{code}
{code}
TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
{code}

RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of 
reducers were limited to bucket number in the table or hive.exec.reducers.max. 
However RS[5] does not provide any ordering so above plan may generate unsorted 
deleted deltas which leads to corrupted data reads.

Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and 
the resulting RS kept the ordering and enabled multiple reducers. It could do 
because ReduceSinkDeduplication was prepared for ACID writes. This was removed 
by HIVE-22538 to get a more generic ReduceSinkDeduplication.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to