[ 
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27772:
----------------------------------
    Labels: pull-request-available  (was: )

> Hive UNIX_TIMESTAMP()should return null for invalid dates
> ---------------------------------------------------------
>
>                 Key: HIVE-27772
>                 URL: https://issues.apache.org/jira/browse/HIVE-27772
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Simhadri Govindappa
>            Assignee: Simhadri Govindappa
>            Priority: Major
>              Labels: pull-request-available
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is 
> giving out the timestamp value as the last valid date, rather than NULL. 
> (e.g. UNIX_TIMESTAMP('2001-02-31', 'yyyy-MM-dd') gives 983354400, which 
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g. 
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or 
> 0).
>  
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp, 
> unix_timestamp(datetimestamp, 'yyyy-MM-dd') as timestampCol from 
> datetimetable;
> INFO  : Compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as 
> timestampCol from datetimetable
> INFO  : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month, 
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string, 
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.102 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62): 
> select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as 
> timestampCol from datetimetable
> INFO  : Completed executing 
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62); 
> Time taken: 0.0 seconds
> +--------+----------------+---------------+
> | month  | datetimestamp  | timestampcol  |
> +--------+----------------+---------------+
> | Feb    | 2001-02-28     | 983318400     |
> | Feb    | 2001-02-29     | 983318400     |
> | Feb    | 2001-02-30     | 983318400     |
> | Feb    | 2001-02-31     | 983318400     |
> | Feb    | 2001-02-32     | NULL          |
> +--------+----------------+---------------+
> 5 rows selected (0.131 seconds){noformat}
>  
>  
> It looks like 
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
>   by default, the formatter has the SMART resolver style.
> According to java jdk : 
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>  
> {noformat}
>   /**
>      * Style to resolve dates and times strictly.
>      * <p>
>      * Using strict resolution will ensure that all parsed values are within
>      * the outer range of valid values for the field. Individual fields may
>      * be further processed for strictness.
>      * <p>
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using strict mode will ensure that the day-of-month is valid
>      * for the year-month, rejecting invalid values.
>      */
>     STRICT,
>     /**
>      * Style to resolve dates and times in a smart, or intelligent, manner.
>      * <p>
>      * Using smart resolution will perform the sensible default for each
>      * field, which may be the same as strict, the same as lenient, or a third
>      * behavior. Individual fields will interpret this differently.
>      * <p>
>      * For example, resolving year-month and day-of-month in the ISO calendar
>      * system using smart mode will ensure that the day-of-month is from
>      * 1 to 31, converting any value beyond the last valid day-of-month to be
>      * the last valid day-of-month.
>      */
>     SMART,{noformat}
>  
>  
> Therefore, we should set the resolverStyle to STRICT to reject invalid date 
> values.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to