[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-15 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r558776410



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @zero323 - that's a good point.  I plan on creating additional PRs for 
some of the other functions that are in the SQL API, but aren't in the 
Scala/Python APIs yet and I'll just add functions for all the APIs in a single 
PR going forward.
   
   Do you have any additional feedback for this PR or think it's good to get 
merged?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-14 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r557557211



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval

Review comment:
   Sure, added `(Scala-specific)`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-14 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r557556637



##
File path: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java
##
@@ -0,0 +1,50 @@
+package test.org.apache.spark.sql;
+
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.StructType;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.sql.Date;
+import java.util.*;
+
+import static org.apache.spark.sql.types.DataTypes.*;
+import static org.apache.spark.sql.functions.*;
+
+public class JavaDateFunctionsSuite {
+private transient TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void makeIntervalWorksWithJava() {
+Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), 
lit(0), lit(0), lit(0));
+List rows = Arrays.asList(
+RowFactory.create(Date.valueOf("2014-06-30"), 
Date.valueOf("2016-06-30")),
+RowFactory.create(Date.valueOf("2015-05-01"), 
Date.valueOf("2017-05-01")),
+RowFactory.create(Date.valueOf("2018-12-30"), 
Date.valueOf("2020-12-30")));
+StructType schema = createStructType(Arrays.asList(
+createStructField("some_date", DateType, false),
+createStructField("expected", DateType, false)));
+Dataset df = spark.createDataFrame(rows, 
schema).withColumn("plus_two_years", col("some_date").plus(twoYears));
+Assert.assertTrue(Arrays.equals(
+(Row[]) df.select(df.col("plus_two_years")).collect(),
+(Row[]) df.select(df.col("expected")).collect()));
+}
+
+}

Review comment:
   Good catch, updated the spacing of this file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-09 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r554457442



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(

Review comment:
   @zero323 - are you OK if I create a separate JIRA and do a separate pull 
request to add this function to PySpark & R?  Seems like @HyukjinKwon is open 
to doing everything in one PR or having separate PRs.  @jaceklaskowski prefers 
one PR for language.  Just let me know how you'd like me to proceed to keep 
this moving forward, thanks!

##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @MaxGekk @HyukjinKwon - Here's where we currently stand on the default 
arguments issue:
   
   * Default arguments are not used by other functions in 
`org.apache.spark.sql.functions`, so exposing a function with default param 
values isn't consistent with what's been done in the past.
   * The default arguments to `make_interval` make the function better for 
Scala users because it allows for this nice syntax `make_interval(hours = 
lit(2))`
   * Java users can use the `make_interval` function that's defined with the 
default arguments, [as illustrated in this 
test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187)
 as long as they supply all seven arguments
   
   There are good arguments to use and to avoid default arguments with 
`make_interval`.  Can you please let me know your updated thoughts on if you 
think we should use the default values or not?  Thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-09 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r554461860



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @MaxGekk @HyukjinKwon - Here's where we currently stand on the default 
arguments issue:
   
   * Default arguments are not used by other functions in 
`org.apache.spark.sql.functions`, so exposing a function with default param 
values isn't consistent with what's been done in the past.
   * The default arguments to `make_interval` make the function better for 
Scala users because it allows for this nice syntax `make_interval(hours = 
lit(2))`
   * Java users can use the `make_interval` function that's defined with the 
default arguments, [as illustrated in this 
test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187)
 as long as they supply all seven arguments
   
   There are good arguments to use and to avoid default arguments with 
`make_interval`.  Can you please let me know your updated thoughts on if you 
think we should use the default values or not?  Thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-09 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r554457442



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(

Review comment:
   @zero323 - are you OK if I create a separate JIRA and do a separate pull 
request to add this function to PySpark & R?  Seems like @HyukjinKwon is open 
to doing everything in one PR or having separate PRs.  @jaceklaskowski prefers 
one PR for language.  Just let me know how you'd like me to proceed to keep 
this moving forward, thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-07 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r553619800



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @MaxGekk - I [added a 
test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187)
 to demonstrate that `make_interval` can be called from the Java API (with the 
Scala method that has default values) when all 7 arguments are passed to the 
method.  e.g. this Java code works:
   
   ```java
   Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), 
lit(0), lit(0));
   Dataset df = spark.createDataFrame(rows, 
schema).withColumn("plus_two_years", col("some_date").plus(twoYears));
   ```
   
   Someone with Java experience should check my result.  If the current 
implementation with default parameters works with the Java API, then I think 
it's ok to keep it, especially because it allows for much more elegant Scala 
code.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-07 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r553619800



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @MaxGekk - I [added a 
test](https://github.com/apache/spark/pull/31073/commits/c9492a99ffa8a3ab1380ebde21c1ea200d02c187)
 to demonstrate that `make_interval` can be called from the Java API (with the 
Scala method that has default values) when all 7 arguments are passed to the 
method.  e.g. this Java code works:
   
   ```java
   Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), 
lit(0), lit(0));
   Dataset df = spark.createDataFrame(rows, 
schema).withColumn("plus_two_years", col("some_date").plus(twoYears));
   ```
   
   Someone with Java experience should check my result.  If the current 
implementation works with the Java API, then I think it's ok to keep it, 
especially because it allows for much more elegant Scala code.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-07 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r553619800



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @MaxGekk - I added a test to demonstrate that `make_interval` can be 
called from the Java API (with the Scala method that has default values) when 
all 7 arguments are passed to the method.  e.g. this Java code works:
   
   ```java
   Column twoYears = make_interval(lit(2), lit(0), lit(0), lit(0), lit(0), 
lit(0), lit(0));
   Dataset df = spark.createDataFrame(rows, 
schema).withColumn("plus_two_years", col("some_date").plus(twoYears));
   ```
   
   Someone with Java experience should check my result.  If the current 
implementation works with the Java API, then I think it's ok to keep it, 
especially because it allows for much more elegant Scala code.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-07 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r553431862



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @HyukjinKwon - thank you for the code review.  The default arguments 
allow for this nice syntax:
   
   ```scala
   df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = 
lit(2)))
   ```
   
   Without the default arguments, we'll need to invoke the method with a lot of 
arguments:
   
   ```scala
   val twoHourInterval = make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), 
lit(0), lit(0))
   df.withColumn("plus_2_hours", col("first_datetime") + twoHourInterval)
   ```
   
   We could overload the method and allow for both syntaxes (using a little 
trick with the last argument that's optional).
   
   ```scala
   def make_interval(
   years: Column,
   months: Column,
   weeks: Column,
   days: Column,
   hours: Column,
   mins: Column,
   secs: Column,
   failOnError: Boolean): Column = withExpr {
 MakeInterval(
   years.expr, months.expr, weeks.expr, days.expr, hours.expr, mins.expr, 
secs.expr, failOnError)
   }
   ```
   
   This would allow Scala users to either manually list 8 arguments (e.g. 
`make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(0), true)` 
or use the clean default syntax (e.g. `make_interval(hours = lit(2))`).  Would 
this give the Java users a usable interface?
   
   I'm also fine removing the default values if that's what's necessary to make 
this method compatible with Java.  @MaxGekk - open to your thoughts too cause I 
know you were originally in favor [of the arguments with default values 
approach](https://issues.apache.org/jira/browse/SPARK-33995?focusedCommentId=17258718=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17258718).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MrPowers commented on a change in pull request #31073: [SPARK-33995][SQL] Expose make_interval as a Scala function

2021-01-07 Thread GitBox


MrPowers commented on a change in pull request #31073:
URL: https://github.com/apache/spark/pull/31073#discussion_r553431862



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2841,6 +2841,31 @@ object functions {
   // DateTime functions
   
//
 
+  /**
+   * Creates a datetime interval
+   *
+   * @param years Number of years
+   * @param months Number of months
+   * @param weeks Number of weeks
+   * @param days Number of days
+   * @param hours Number of hours
+   * @param mins Number of mins
+   * @param secs Number of secs
+   * @return A datetime interval
+   * @group datetime_funcs
+   * @since 3.2.0
+   */
+  def make_interval(
+  years: Column = lit(0),

Review comment:
   @HyukjinKwon - thank you for the code review.  The default arguments 
allow for this nice syntax:
   
   ```scala
   df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = 
lit(2)))
   ```
   
   Without the default arguments, we'll need to invoke the method with a lot of 
arguments:
   
   ```scala
   val twoHourInterval = make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), 
lit(0), lit(0))
   df.withColumn("plus_2_hours", col("first_datetime") + twoHourInterval)
   ```
   
   We could overload the method and allow for both syntaxes (using a little 
trick with the last argument that's optional).
   
   ```scala
   def make_interval(
   years: Column,
   months: Column,
   weeks: Column,
   days: Column,
   hours: Column,
   mins: Column,
   secs: Column,
   failOnError: Boolean): Column = withExpr {
 MakeInterval(
   years.expr, months.expr, weeks.expr, days.expr, hours.expr, mins.expr, 
secs.expr, failOnError)
   }
   ```
   
   This would allow Scala users to either manually list 8 arguments (e.g. 
`make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(0), true)` 
or use the clean default syntax (e.g. `make_interval(hours = lit(2))`).  Would 
this give the Java users a usable interface?
   
   I'm also fine removing the default values if that's what's necessary to make 
this method compatible with Java.  @MaxGekk - open to your thoughts too cause I 
know you were originally in favor [of the default arguments 
approach](https://issues.apache.org/jira/browse/SPARK-33995?focusedCommentId=17258718=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17258718).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org