Hi Sparkers! (maybe Sparkles ?) I just wanted to bring up the apparently ?controversial? Calendar Interval topic.
I worked on: https://issues.apache.org/jira/browse/SPARK-24702 <https://issues.apache.org/jira/browse/SPARK-24702>, https://github.com/apache/spark/pull/21706 <https://github.com/apache/spark/pull/21706> The user was reporting an unexpected behaviour where he/she wasn’t able to cast to a Calendar Interval type. In the current version of Spark the following code works: scala> spark.sql("SELECT 'interval 1 hour' as a").select(col("a").cast("calendarinterval")).show() +----------------+ | a| +----------------+ |interval 1 hours| +----------------+ While the following doesn’t: spark.sql("SELECT CALENDARINTERVAL('interval 1 hour') as a").show() Since the DataFrame API equivalent of the SQL worked, I thought adding it would be an easy decision to make (to make it consistent) However, I got push-back on the PR on the basis that “we do not plan to expose Calendar Interval as a public type” Should there be a consensus on either cleaning up the public DataFrame API out of CalendarIntervalType OR making it consistent with the SQL ? -- Best regards, Daniel Mateus Pires Data Engineer @ Hudson's Bay Company