[ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154205#comment-14154205
 ] 

Prasanth J commented on HIVE-8151:
----------------------------------

[~wzc1989] You are exactly correct. The getConversionSelectOperator does not 
seem populate and pass columnExprMap in constructor or set it explicitly. As 
you had mentioned this is the reason why Select operator deduplication did not 
propagate the UDF for type conversion properly. I also verified that the test 
case passes now. I will put up another version of patch with the fix and test 
case. Thanks again for the test case and suggested fix. 

> Dynamic partition sort optimization inserts record wrongly to partition when 
> used with GroupBy
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8151
>                 URL: https://issues.apache.org/jira/browse/HIVE-8151
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, 
> HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, 
> HIVE-8151.8.patch
>
>
> HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
> method to FileSink operator to look for changes in reduce key for creating 
> partition directories. This method however is not reliable as the key called 
> with startGroup() is different from the key called with processOp(). 
> startGroup() is called with newly changed key whereas processOp() is called 
> with previously aggregated key. This will result in processOp() writing the 
> last row of previous group as the first row of next group. This happens only 
> when used with group by operator.
> The fix is to not rely on startGroup() and do the partition directory 
> creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to