[ 
https://issues.apache.org/jira/browse/HIVE-25449?focusedWorklogId=640598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640598
 ]

ASF GitHub Bot logged work on HIVE-25449:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Aug/21 07:09
            Start Date: 23/Aug/21 07:09
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on pull request #2591:
URL: https://github.com/apache/hive/pull/2591#issuecomment-903502845


   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 640598)
    Time Spent: 1h  (was: 50m)

> datediff() gives wrong output when run in a tez task with some non-UTC 
> timezone
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-25449
>                 URL: https://issues.apache.org/jira/browse/HIVE-25449
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Repro (thanks Qiaosong Dong) - 
> Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}}
> {code}
> create external table test_dt(id string, dt date);
> insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07');
> select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt 
> on dt1.id = dt.id;
> +------+
> | _c0  |
> +------+
> | 6    |
> | 7    |
> +------+
> {code}
> Expected output - 
> {code}
> +------+
> | _c0  |
> +------+
> | 5    |
> | 6    |
> +------+
> {code}
> *Cause*
> This happens because in {{VectorUDFDateDiffColScalar}} class  
> 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to 
> parse the date strings which interprets it to be in local timezone.
> 2. For first column we get a column vector which represents the date as epoch 
> day. This is always in UTC.
> *Solution*
> We need to check other variants of datediff UDFs as well and change the 
> parsing mechanism to always interpret date strings in UTC. 
>  
> I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue.
> {code}
> -          date.setTime(formatter.parse(new String(bytesValue, 
> "UTF-8")).getTime());
> -          baseDate = DateWritableV2.dateToDays(date);
> +          org.apache.hadoop.hive.common.type.Date hiveDate
> +              = org.apache.hadoop.hive.common.type.Date.valueOf(new 
> String(bytesValue, "UTF-8"));
> +          date.setTime(hiveDate.toEpochMilli());
> +          baseDate = hiveDate.toEpochDay();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to