There is another feature missing for CalendarInterval, which is related to comparability: measure the length of an interval.

Would be nice if you could access the length of an interval, than you could compute something like this:

|Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 13:30:25"))) .toDF("start", "end") .withColumn("interval", $"end" - $"start") .withColumn("interval [h]", *INTERVAL LENGTH IN HOURS*) .withColumn("rate [€/h]", lit(1.45)) .withColumn("price [€]", $"interval [h]" * $"rate [€/h]") .show(false) +-------------------+-------------------+-----------------------------+------------------+----------+------------------+ |start |end |interval |interval [h] |rate [€/h]|price [€] | +-------------------+-------------------+-----------------------------+------------------+----------+------------------+ |2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 seconds|1.5069444444444444|1.45 |2.1850694444444443| +-------------------+-------------------+-----------------------------+------------------+----------+------------------+ |


The length of an interval can be measured by dividing it with the length of your measuring unit, e.g. "1 hour":

||$"interval" / lit("1 hour").cast(CalendarIntervalType)| |


Which brings us to CalendarInterval division: https://github.com/apache/spark/pull/27805

Enrico


Am 11.02.20 um 21:09 schrieb Enrico Minack:
I compute the difference of two timestamps and compare them with a constant interval:

Seq(("2019-01-02 12:00:00", "2019-01-02 13:30:00"))
  .toDF("start", "end")
.select($"start".cast(TimestampType), $"end".cast(TimestampType))
  .select($"start", $"end", ($"end" - $"start").as("diff"))
  .where($"diff" < lit("INTERVAL 2 HOUR").cast(CalendarIntervalType))
  .show

Coming from timestamps, the interval should have correct hours (millisecond component), so comparing it with the "right kinds of intervals" should always be correct.

Enrico


Am 11.02.20 um 17:06 schrieb Wenchen Fan:
What's your use case to compare intervals? It's tricky in Spark as there is only one interval type and you can't really compare one month with 30 days.

On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack <m...@enrico.minack.dev <mailto:m...@enrico.minack.dev>> wrote:

    Hi Devs,

    I would like to know what is the current roadmap of making
    CalendarInterval comparable and orderable again (SPARK-29679,
    SPARK-29385, #26337).

    With #27262, this got reverted but SPARK-30551 does not mention
    how to
    go forward in this matter. I have found SPARK-28494, but this
    seems to
    be stale.

    While I find it useful to compare such intervals, I cannot find a
    way to
    work around the missing comparability. Is there a way to get,
    e.g. the
    seconds that an interval represents to be able to compare
    intervals? In
    org.apache.spark.sql.catalyst.util.IntervalUtils there are
    methods like
    getEpoch or getDuration, which I cannot see are exposed to SQL or
    in the
    org.apache.spark.sql.functions package.

    Thanks for the insights,
    Enrico


    ---------------------------------------------------------------------
    To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>



Reply via email to