[ 
https://issues.apache.org/jira/browse/HIVE-25449?focusedWorklogId=638639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638639
 ]

ASF GitHub Bot logged work on HIVE-25449:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Aug/21 12:57
            Start Date: 17/Aug/21 12:57
    Worklog Time Spent: 10m 
      Work Description: ShubhamChaurasia opened a new pull request #2591:
URL: https://github.com/apache/hive/pull/2591


   Please find the repro and detailed description here - 
https://issues.apache.org/jira/browse/HIVE-25449
   
   ### What changes were proposed in this pull request?
   Removed usage of SimpleDateFormat which was using local timezone parsing 
scalar dates. 
   
   
   ### Why are the changes needed?
   This fixes the bug described in the Jira and datediff() now gives correct 
values.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Manual testing and added two new unit tests - testDateDiffColScalarWithTz() 
and testDateDiffScalarColWithTz()
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 638639)
    Remaining Estimate: 0h
            Time Spent: 10m

> datediff() gives wrong output when run in a tez task with some non-UTC 
> timezone
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-25449
>                 URL: https://issues.apache.org/jira/browse/HIVE-25449
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repro (thanks Qiaosong Dong) - 
> Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}}
> {code}
> create external table test_dt(id string, dt date);
> insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07');
> select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt 
> on dt1.id = dt.id;
> +------+
> | _c0  |
> +------+
> | 6    |
> | 7    |
> +------+
> {code}
> Expected output - 
> {code}
> +------+
> | _c0  |
> +------+
> | 5    |
> | 6    |
> +------+
> {code}
> *Cause*
> This happens because in {{VectorUDFDateDiffColScalar}} class  
> 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to 
> parse the date strings which interprets it to be in local timezone.
> 2. For first column we get a column vector which represents the date as epoch 
> day. This is always in UTC.
> *Solution*
> We need to check other variants of datediff UDFs as well and change the 
> parsing mechanism to always interpret date strings in UTC. 
>  
> I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue.
> {code}
> -          date.setTime(formatter.parse(new String(bytesValue, 
> "UTF-8")).getTime());
> -          baseDate = DateWritableV2.dateToDays(date);
> +          org.apache.hadoop.hive.common.type.Date hiveDate
> +              = org.apache.hadoop.hive.common.type.Date.valueOf(new 
> String(bytesValue, "UTF-8"));
> +          date.setTime(hiveDate.toEpochMilli());
> +          baseDate = hiveDate.toEpochDay();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to