I compute the difference of two timestamps and compare them with a
constant interval:
Seq(("2019-01-02 12:00:00", "2019-01-02 13:30:00"))
.toDF("start", "end")
.select($"start".cast(TimestampType), $"end".cast(TimestampType))
.select($"start", $"end", ($"end" - $"start").as("diff"))
.where($"diff" < lit("INTERVAL 2 HOUR").cast(CalendarIntervalType))
.show
Coming from timestamps, the interval should have correct hours
(millisecond component), so comparing it with the "right kinds of
intervals" should always be correct.
Enrico
Am 11.02.20 um 17:06 schrieb Wenchen Fan:
What's your use case to compare intervals? It's tricky in Spark as
there is only one interval type and you can't really compare one month
with 30 days.
On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack <m...@enrico.minack.dev
<mailto:m...@enrico.minack.dev>> wrote:
Hi Devs,
I would like to know what is the current roadmap of making
CalendarInterval comparable and orderable again (SPARK-29679,
SPARK-29385, #26337).
With #27262, this got reverted but SPARK-30551 does not mention
how to
go forward in this matter. I have found SPARK-28494, but this
seems to
be stale.
While I find it useful to compare such intervals, I cannot find a
way to
work around the missing comparability. Is there a way to get, e.g.
the
seconds that an interval represents to be able to compare
intervals? In
org.apache.spark.sql.catalyst.util.IntervalUtils there are methods
like
getEpoch or getDuration, which I cannot see are exposed to SQL or
in the
org.apache.spark.sql.functions package.
Thanks for the insights,
Enrico
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>