Shubham Chaurasia created HIVE-25449:
----------------------------------------

             Summary: datediff() gives wrong output when we set 
tez.task.launch.cmd-opts to some non UTC timezone
                 Key: HIVE-25449
                 URL: https://issues.apache.org/jira/browse/HIVE-25449
             Project: Hive
          Issue Type: Bug
          Components: UDF
            Reporter: Shubham Chaurasia
            Assignee: Shubham Chaurasia


Repro (thanks Qiaosong Dong) - 
{code}
create external table test_dt(id string, dt date);
insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07');

select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on 
dt1.id = dt.id;
+------+
| _c0  |
+------+
| 6    |
| 7    |
+------+
{code}

Expected output - 
{code}
+------+
| _c0  |
+------+
| 5    |
| 6    |
+------+
{code}

*Cause*

This happens because in {{VectorUDFDateDiffColScalar}} class  
1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse 
the date strings which interprets it to be local timezone.

2. For first column we get a column vector which represents the date as epoch 
day. This is always in UTC.

*Solution*

We need to check other variants of datediff UDFs as well and change the parsing 
mechanism to always interpret date strings in UTC. 
 
I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue.
{code}
-          date.setTime(formatter.parse(new String(bytesValue, 
"UTF-8")).getTime());
-          baseDate = DateWritableV2.dateToDays(date);
+          org.apache.hadoop.hive.common.type.Date hiveDate
+              = org.apache.hadoop.hive.common.type.Date.valueOf(new 
String(bytesValue, "UTF-8"));
+          date.setTime(hiveDate.toEpochMilli());
+          baseDate = hiveDate.toEpochDay();
{code}










--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to