[ https://issues.apache.org/jira/browse/SPARK-31056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-31056. ---------------------------------- Resolution: Won't Fix > Add CalendarIntervals division > ------------------------------ > > Key: SPARK-31056 > URL: https://issues.apache.org/jira/browse/SPARK-31056 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Enrico Minack > Priority: Major > > {{CalendarInterval}} should be allowed for division. The {{CalendarInterval}} > consists of three time components: {{months}}, {{days}} and {{microseconds}}. > The division can only be defined between intervals that have a single > non-zero time component, while both intervals have the same non-zero time > component. Otherwise the division expression would be ambiguous. > This allows to evaluate the magnitude of {{CalendarInterval}} in SQL > expressions: > {code} > Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 > 13:30:25"))) > .toDF("start", "end") > .withColumn("interval", $"end" - $"start") > .withColumn("interval [h]", $"interval" / lit("1 > hour").cast(CalendarIntervalType)) > .withColumn("rate [€/h]", lit(1.45)) > .withColumn("price [€]", $"interval [h]" * $"rate [€/h]") > .show(false) > +-------------------+-------------------+-----------------------------+------------------+----------+------------------+ > |start |end |interval > |interval [h] |rate [€/h]|price [€] | > +-------------------+-------------------+-----------------------------+------------------+----------+------------------+ > |2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 > seconds|1.5069444444444444|1.45 |2.1850694444444443| > +-------------------+-------------------+-----------------------------+------------------+----------+------------------+ > {code} > The currently available approach is > {code} > Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 > 13:30:25"))) > .toDF("start", "end") > .withColumn("interval [s]", unix_timestamp($"end") - > unix_timestamp($"start")) > .withColumn("interval [h]", $"interval [s]" / 3600) > .withColumn("rate [€/h]", lit(1.45)) > .withColumn("price [€]", $"interval [h]" * $"rate [€/h]") > .show(false) > {code} > Going through {{unix_timestamp}} is a hack and it pollutes the SQL query with > unrelated semantics (unix timestamp is completely irrelevant for this > computation). It is merely there because there is currently no way to access > the length of an {{CalendarInterval}}. Dividing an interval by another > interval provides means to measure the length in an arbitrary unit (minutes, > hours, quarter hours). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org