Shubham Chaurasia created HIVE-25449: ----------------------------------------
Summary: datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone Key: HIVE-25449 URL: https://issues.apache.org/jira/browse/HIVE-25449 Project: Hive Issue Type: Bug Components: UDF Reporter: Shubham Chaurasia Assignee: Shubham Chaurasia Repro (thanks Qiaosong Dong) - {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +------+ | _c0 | +------+ | 6 | | 7 | +------+ {code} Expected output - {code} +------+ | _c0 | +------+ | 5 | | 6 | +------+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)