[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

Prasanth J (JIRA) Tue, 30 Sep 2014 17:53:16 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154132#comment-14154132
 ]


Prasanth J commented on HIVE-8151:
----------------------------------

[~wzc1989] Agreed. The explain plan of the query when the optimization is 
enabled/disabled shows different ExprNodeDesc for Select operator. As you said, 
the UDF to cast from bigint to int is lost during deduplication process. I will 
dig down further to see why colExprMap is missing and deduplication of select 
operators fails to retain the UDF. I am exactly right at this point debugging 
now. Thanks for debugging this issue and providing insights. Will post update 
soon and will also see if your suggested fix is the right way to do it.

> Dynamic partition sort optimization inserts record wrongly to partition when 
> used with GroupBy
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8151
>                 URL: https://issues.apache.org/jira/browse/HIVE-8151
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
> HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, 
> HIVE-8151.8.patch
>
>
> HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
> method to FileSink operator to look for changes in reduce key for creating 
> partition directories. This method however is not reliable as the key called 
> with startGroup() is different from the key called with processOp(). 
> startGroup() is called with newly changed key whereas processOp() is called 
> with previously aggregated key. This will result in processOp() writing the 
> last row of previous group as the first row of next group. This happens only 
> when used with group by operator.
> The fix is to not rely on startGroup() and do the partition directory 
> creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

Reply via email to