Riju Trivedi created HIVE-28075:
-----------------------------------

             Summary: Vectorized DayOFWeek returns inconsistent results for 
non-UTC timezones.
                 Key: HIVE-28075
                 URL: https://issues.apache.org/jira/browse/HIVE-28075
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 4.0.0-beta-1
            Reporter: Riju Trivedi
            Assignee: Riju Trivedi


Simple problem reproduce - 
{code:java}
--! qt:timezone:Asia/Shanghai

CREATE EXTERNAL TABLE dayOfWeek_test(
`fund_code` string,
`test_date` string
);

INSERT INTO dayOfWeek_test(fund_code,test_date)
values('SEC016210079','2023-04-13');

SELECT fund_code,
 test_date,
 dayofweek(test_date) AS SR,
 CASE
     WHEN dayofweek(test_date) = 1 THEN 7
     ELSE dayofweek(test_date) - 1
 END AS week_day
FROM dayOfWeek_test; 

Result :
SEC016210079    2023-04-13 4  3

Expected Result:
SEC016210079 2023-04-13 5  4

{code}
The issue is only with Vectorized path and non-UTC timezones. The 
non-vectorized path uses _DateTimeFormatter_ and the vectorized path __ uses 
_SimpleDateFormat_ and calendar initialized with UTC timezone. Hence, the local 
time zone date is converted to UTC which changes the date and dayOfWeek() 
result.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFDayOfWeekString.java#L59]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to