[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r558776410 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @zero323 - that's a good point. I plan on creating additional PRs for some of the other functions that are in the SQL API, but aren't in the Scala/Python APIs yet and I'll just add functions for all the APIs in a single PR going forward. Do you have any additional feedback for this PR or think it's good to get merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r557557211 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval Review comment: Sure, added `(Scala-specific)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r557556637 ## File path: sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java ## @@ -0,0 +1,50 @@ +package test.org.apache.spark.sql; + +import org.apache.spark.sql.Column; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.StructType; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Date; +import java.util.*; + +import static org.apache.spark.sql.types.DataTypes.*; +import static org.apache.spark.sql.functions.*; + +public class JavaDateFunctionsSuite { +private transient TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void makeIntervalWorksWithJava() { +Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), lit(0), lit(0)); +List rows = Arrays.asList( +RowFactory.create(Date.valueOf("2014-06-30"), Date.valueOf("2016-06-30")), +RowFactory.create(Date.valueOf("2015-05-01"), Date.valueOf("2017-05-01")), +RowFactory.create(Date.valueOf("2018-12-30"), Date.valueOf("2020-12-30"))); +StructType schema = createStructType(Arrays.asList( +createStructField("some_date", DateType, false), +createStructField("expected", DateType, false))); +Dataset df = spark.createDataFrame(rows, schema).withColumn("plus_two_years", col("some_date").plus(twoYears)); +Assert.assertTrue(Arrays.equals( +(Row[]) df.select(df.col("plus_two_years")).collect(), +(Row[]) df.select(df.col("expected")).collect())); +} + +} Review comment: Good catch, updated the spacing of this file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r554457442 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( Review comment: @zero323 - are you OK if I create a separate JIRA and do a separate pull request to add this function to PySpark & R? Seems like @HyukjinKwon is open to doing everything in one PR or having separate PRs. @jaceklaskowski prefers one PR for language. Just let me know how you'd like me to proceed to keep this moving forward, thanks! ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @MaxGekk @HyukjinKwon - Here's where we currently stand on the default arguments issue: * Default arguments are not used by other functions in `org.apache.spark.sql.functions`, so exposing a function with default param values isn't consistent with what's been done in the past. * The default arguments to `make_interval` make the function better for Scala users because it allows for this nice syntax `make_interval(hours = lit(2))` * Java users can use the `make_interval` function that's defined with the default arguments, [as illustrated in this test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187) as long as they supply all seven arguments There are good arguments to use and to avoid default arguments with `make_interval`. Can you please let me know your updated thoughts on if you think we should use the default values or not? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r554461860 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @MaxGekk @HyukjinKwon - Here's where we currently stand on the default arguments issue: * Default arguments are not used by other functions in `org.apache.spark.sql.functions`, so exposing a function with default param values isn't consistent with what's been done in the past. * The default arguments to `make_interval` make the function better for Scala users because it allows for this nice syntax `make_interval(hours = lit(2))` * Java users can use the `make_interval` function that's defined with the default arguments, [as illustrated in this test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187) as long as they supply all seven arguments There are good arguments to use and to avoid default arguments with `make_interval`. Can you please let me know your updated thoughts on if you think we should use the default values or not? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r554457442 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( Review comment: @zero323 - are you OK if I create a separate JIRA and do a separate pull request to add this function to PySpark & R? Seems like @HyukjinKwon is open to doing everything in one PR or having separate PRs. @jaceklaskowski prefers one PR for language. Just let me know how you'd like me to proceed to keep this moving forward, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r553619800 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @MaxGekk - I [added a test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187) to demonstrate that `make_interval` can be called from the Java API (with the Scala method that has default values) when all 7 arguments are passed to the method. e.g. this Java code works: ```java Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), lit(0), lit(0)); Dataset df = spark.createDataFrame(rows, schema).withColumn("plus_two_years", col("some_date").plus(twoYears)); ``` Someone with Java experience should check my result. If the current implementation with default parameters works with the Java API, then I think it's ok to keep it, especially because it allows for much more elegant Scala code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r553619800 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @MaxGekk - I [added a test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187) to demonstrate that `make_interval` can be called from the Java API (with the Scala method that has default values) when all 7 arguments are passed to the method. e.g. this Java code works: ```java Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), lit(0), lit(0)); Dataset df = spark.createDataFrame(rows, schema).withColumn("plus_two_years", col("some_date").plus(twoYears)); ``` Someone with Java experience should check my result. If the current implementation works with the Java API, then I think it's ok to keep it, especially because it allows for much more elegant Scala code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r553619800 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @MaxGekk - I added a test to demonstrate that `make_interval` can be called from the Java API (with the Scala method that has default values) when all 7 arguments are passed to the method. e.g. this Java code works: ```java Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), lit(0), lit(0)); Dataset df = spark.createDataFrame(rows, schema).withColumn("plus_two_years", col("some_date").plus(twoYears)); ``` Someone with Java experience should check my result. If the current implementation works with the Java API, then I think it's ok to keep it, especially because it allows for much more elegant Scala code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r553431862 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @HyukjinKwon - thank you for the code review. The default arguments allow for this nice syntax: ```scala df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = lit(2))) ``` Without the default arguments, we'll need to invoke the method with a lot of arguments: ```scala val twoHourInterval = make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(0)) df.withColumn("plus_2_hours", col("first_datetime") + twoHourInterval) ``` We could overload the method and allow for both syntaxes (using a little trick with the last argument that's optional). ```scala def make_interval( years: Column, months: Column, weeks: Column, days: Column, hours: Column, mins: Column, secs: Column, failOnError: Boolean): Column = withExpr { MakeInterval( years.expr, months.expr, weeks.expr, days.expr, hours.expr, mins.expr, secs.expr, failOnError) } ``` This would allow Scala users to either manually list 8 arguments (e.g. `make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(0), true)` or use the clean default syntax (e.g. `make_interval(hours = lit(2))`). Would this give the Java users a usable interface? I'm also fine removing the default values if that's what's necessary to make this method compatible with Java. @MaxGekk - open to your thoughts too cause I know you were originally in favor [of the arguments with default values approach](https://issues.apache.org/jira/browse/SPARK-33995?focusedCommentId=17258718=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17258718). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function
MrPowers commented on a change in pull request #31073: URL: https://github.com/apache/spark/pull/31073#discussion_r553431862 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2841,6 +2841,31 @@ object functions { // DateTime functions // + /** + * Creates a datetime interval + * + * @param years Number of years + * @param months Number of months + * @param weeks Number of weeks + * @param days Number of days + * @param hours Number of hours + * @param mins Number of mins + * @param secs Number of secs + * @return A datetime interval + * @group datetime_funcs + * @since 3.2.0 + */ + def make_interval( + years: Column = lit(0), Review comment: @HyukjinKwon - thank you for the code review. The default arguments allow for this nice syntax: ```scala df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = lit(2))) ``` Without the default arguments, we'll need to invoke the method with a lot of arguments: ```scala val twoHourInterval = make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(0)) df.withColumn("plus_2_hours", col("first_datetime") + twoHourInterval) ``` We could overload the method and allow for both syntaxes (using a little trick with the last argument that's optional). ```scala def make_interval( years: Column, months: Column, weeks: Column, days: Column, hours: Column, mins: Column, secs: Column, failOnError: Boolean): Column = withExpr { MakeInterval( years.expr, months.expr, weeks.expr, days.expr, hours.expr, mins.expr, secs.expr, failOnError) } ``` This would allow Scala users to either manually list 8 arguments (e.g. `make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(0), true)` or use the clean default syntax (e.g. `make_interval(hours = lit(2))`). Would this give the Java users a usable interface? I'm also fine removing the default values if that's what's necessary to make this method compatible with Java. @MaxGekk - open to your thoughts too cause I know you were originally in favor [of the default arguments approach](https://issues.apache.org/jira/browse/SPARK-33995?focusedCommentId=17258718=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17258718). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org