[ 
https://issues.apache.org/jira/browse/SPARK-31056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31056.
----------------------------------
    Resolution: Won't Fix

> Add CalendarIntervals division
> ------------------------------
>
>                 Key: SPARK-31056
>                 URL: https://issues.apache.org/jira/browse/SPARK-31056
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> {{CalendarInterval}} should be allowed for division. The {{CalendarInterval}} 
> consists of three time components: {{months}}, {{days}} and {{microseconds}}. 
> The division can only be defined between intervals that have a single 
> non-zero time component, while both intervals have the same non-zero time 
> component. Otherwise the division expression would be ambiguous.
> This allows to evaluate the magnitude of {{CalendarInterval}} in SQL 
> expressions:
> {code}
> Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 
> 13:30:25")))
>   .toDF("start", "end")
>   .withColumn("interval", $"end" - $"start")
>   .withColumn("interval [h]", $"interval" / lit("1 
> hour").cast(CalendarIntervalType))
>   .withColumn("rate [€/h]", lit(1.45))
>   .withColumn("price [€]", $"interval [h]" * $"rate [€/h]")
>   .show(false)
> +-------------------+-------------------+-----------------------------+------------------+----------+------------------+
> |start              |end                |interval                     
> |interval [h]      |rate [€/h]|price [€]         |
> +-------------------+-------------------+-----------------------------+------------------+----------+------------------+
> |2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 
> seconds|1.5069444444444444|1.45      |2.1850694444444443|
> +-------------------+-------------------+-----------------------------+------------------+----------+------------------+
> {code}
> The currently available approach is
> {code}
> Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 
> 13:30:25")))
>   .toDF("start", "end")
>   .withColumn("interval [s]", unix_timestamp($"end") - 
> unix_timestamp($"start"))
>   .withColumn("interval [h]", $"interval [s]" / 3600)
>   .withColumn("rate [€/h]", lit(1.45))
>   .withColumn("price [€]", $"interval [h]" * $"rate [€/h]")
>   .show(false)
> {code}
> Going through {{unix_timestamp}} is a hack and it pollutes the SQL query with 
> unrelated semantics (unix timestamp is completely irrelevant for this 
> computation). It is merely there because there is currently no way to access 
> the length of an {{CalendarInterval}}. Dividing an interval by another 
> interval provides means to measure the length in an arbitrary unit (minutes, 
> hours, quarter hours).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to