[ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8151:
-----------------------------
    Status: Patch Available  (was: Open)

> Dynamic partition sort optimization inserts record wrongly to partition when 
> used with GroupBy
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8151
>                 URL: https://issues.apache.org/jira/browse/HIVE-8151
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1, 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>            Priority: Critical
>         Attachments: HIVE-8151.1.patch
>
>
> HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
> method to FileSink operator to look for changes in reduce key for creating 
> partition directories. This method however is reliable as the key called with 
> startGroup() is different from the key called with processOp(). startGroup() 
> is called with newly changed key whereas processOp() is called with 
> previously aggregated key. This will result in processOp() writing the last 
> row of previous group as the first row of next group. This happens only when 
> used with group by operator.
> The fix is to not rely on startGroup() and do the partition directory 
> creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to