[ 
https://issues.apache.org/jira/browse/SPARK-31447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083672#comment-17083672
 ] 

Sathyaprakash Govindasamy commented on SPARK-31447:
---------------------------------------------------

I am proposing in SubtractTimestamps, we need to populate month, days field as 
well for CalendarInterval. Also, in ExtractIntervalDays, while getting number 
of days, we need to read input from month propery as well to calculate total 
number of days.

For this calendar interval _new CalendarInterval(months=2, days=4, 
microseconds=6000)_, getDays function in IntervalUtils should return ((2 * 
MONTHS_PER_DAY) + 4) instead of 4.

I will raise PR for this

> DATE_PART functions produces incorrect result
> ---------------------------------------------
>
>                 Key: SPARK-31447
>                 URL: https://issues.apache.org/jira/browse/SPARK-31447
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3
>            Reporter: Sathyaprakash Govindasamy
>            Priority: Major
>
> Spark does not extract correct date part from calendar interval. Below is one 
> example for extracting day from calendar interval
> {code:java}
> spark.sql("SELECT EXTRACT(DAY FROM (cast('2020-01-15 00:00:00' as timestamp) 
> - cast('2020-01-01 00:00:00' as timestamp)))").show{code}
> +------------------------------------------------------------------------------------------------------------------------+
> |date_part('DAY', subtracttimestamps(CAST('2020-01-15 00:00:00' AS 
> TIMESTAMP), CAST('2020-01-01 00:00:00' AS TIMESTAMP)))|
> +------------------------------------------------------------------------------------------------------------------------+
> | 0|
> +------------------------------------------------------------------------------------------------------------------------+
> Actual output 0 days
> Correct output 14 days
> This is because SubtractTimestamps expression calculates difference and 
> populates only microseconds field. months and days field are set to zero
> {code:java}
> new CalendarInterval(months=0, days=0, microseconds=end.asInstanceOf[Long] - 
> start.asInstanceOf[Long]){code}
> https://github.com/apache/spark/blob/2c5d489679ba3814973680d65853877664bcd931/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2211
> But ExtractIntervalDays expression retrieves days information from days field 
> in CalendarInterval and returns zero.
> {code:java}
> def getDays(interval: CalendarInterval): Int = {
>  interval.days
>  }{code}
> https://github.com/apache/spark/blob/2c5d489679ba3814973680d65853877664bcd931/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala#L73



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to