Riju Trivedi created HIVE-28075: ----------------------------------- Summary: Vectorized DayOFWeek returns inconsistent results for non-UTC timezones. Key: HIVE-28075 URL: https://issues.apache.org/jira/browse/HIVE-28075 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0-beta-1 Reporter: Riju Trivedi Assignee: Riju Trivedi
Simple problem reproduce - {code:java} --! qt:timezone:Asia/Shanghai CREATE EXTERNAL TABLE dayOfWeek_test( `fund_code` string, `test_date` string ); INSERT INTO dayOfWeek_test(fund_code,test_date) values('SEC016210079','2023-04-13'); SELECT fund_code, test_date, dayofweek(test_date) AS SR, CASE WHEN dayofweek(test_date) = 1 THEN 7 ELSE dayofweek(test_date) - 1 END AS week_day FROM dayOfWeek_test; Result : SEC016210079 2023-04-13 4 3 Expected Result: SEC016210079 2023-04-13 5 4 {code} The issue is only with Vectorized path and non-UTC timezones. The non-vectorized path uses _DateTimeFormatter_ and the vectorized path __ uses _SimpleDateFormat_ and calendar initialized with UTC timezone. Hence, the local time zone date is converted to UTC which changes the date and dayOfWeek() result. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFDayOfWeekString.java#L59] -- This message was sent by Atlassian Jira (v8.20.10#820010)