[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238113456
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

Got it. Thank you, @HyukjinKwon , @srowen , @cloud-fan .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23178


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238083055
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

It's not a new API anyway, it will be weird to change since to 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238062995
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

I'd go ahead and leave the Since version. The API is essentially unchanged, 
though there are some marginal breaking compile time changes. But same is true 
of many things we are changing in 3.0. I've tagged the JIRA with 
`release-notes` and will add a blurb about the change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238057361
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

yea actually I was wondering about the same thing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-11-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238007506
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

Is it better to change it to `@Stable` with `@since 3.0.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-11-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238007219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

I'm +1 for this PR, but I'm just wondering if this `@Stable` tag with 
`@since 1.3.0` tag is valid or not here.
Previous case class was stable until 2.4.x and new trait will be stable 
since 3.0. But, the stability is broken at 3.0.0 once. Did I understand 
correctly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r237522717
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
-case class UserDefinedFunction protected[sql] (
-f: AnyRef,
-dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
-
-  private var _nameOption: Option[String] = None
-  private var _nullable: Boolean = true
-  private var _deterministic: Boolean = true
-
-  // This is a `var` instead of in the constructor for backward 
compatibility of this case class.
-  // TODO: revisit this case class in Spark 3.0, and narrow down the 
public surface.
-  private[sql] var nullableTypes: Option[Seq[Boolean]] = None
+trait UserDefinedFunction {
--- End diff --

good idea! though I'm not sure if `sealed` works for Java.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-11-29 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r237521153
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
-case class UserDefinedFunction protected[sql] (
-f: AnyRef,
-dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
-
-  private var _nameOption: Option[String] = None
-  private var _nullable: Boolean = true
-  private var _deterministic: Boolean = true
-
-  // This is a `var` instead of in the constructor for backward 
compatibility of this case class.
-  // TODO: revisit this case class in Spark 3.0, and narrow down the 
public surface.
-  private[sql] var nullableTypes: Option[Seq[Boolean]] = None
+trait UserDefinedFunction {
 
   /**
* Returns true when the UDF can return a nullable value.
*
* @since 2.3.0
*/
-  def nullable: Boolean = _nullable
+  def nullable: Boolean
 
   /**
* Returns true iff the UDF is deterministic, i.e. the UDF produces the 
same output given the same
* input.
*
* @since 2.3.0
*/
-  def deterministic: Boolean = _deterministic
+  def deterministic: Boolean
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   @scala.annotation.varargs
-  def apply(exprs: Column*): Column = {
+  def apply(exprs: Column*): Column
+
+  /**
+   * Updates UserDefinedFunction with a given name.
+   *
+   * @since 2.3.0
+   */
+  def withName(name: String): UserDefinedFunction
+
+  /**
+   * Updates UserDefinedFunction to non-nullable.
+   *
+   * @since 2.3.0
+   */
+  def asNonNullable(): UserDefinedFunction
+
+  /**
+   * Updates UserDefinedFunction to nondeterministic.
+   *
+   * @since 2.3.0
+   */
+  def asNondeterministic(): UserDefinedFunction
+}
+
+private[sql] case class SparkUserDefinedFunction(
+f: AnyRef,
+dataType: DataType,
+inputTypes: Option[Seq[DataType]],
+nullableTypes: Option[Seq[Boolean]],
+name: Option[String] = None,
+nullable: Boolean = true,
+deterministic: Boolean = true) extends UserDefinedFunction {
+
+  @scala.annotation.varargs
+  override def apply(exprs: Column*): Column = {
 // TODO: make sure this class is only instantiated through 
`SparkUserDefinedFunction.create()`
 // and `nullableTypes` is always set.
-if (nullableTypes.isEmpty) {
-  nullableTypes = Some(ScalaReflection.getParameterTypeNullability(f))
-}
 if (inputTypes.isDefined) {
   assert(inputTypes.get.length == nullableTypes.get.length)
 }
 
+val inputsNullSafe = if (nullableTypes.isEmpty) {
--- End diff --

You can use `getOrElse` here and even inline this into the call below, but 
I don't really care.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-11-29 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r237522078
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
-case class UserDefinedFunction protected[sql] (
-f: AnyRef,
-dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
-
-  private var _nameOption: Option[String] = None
-  private var _nullable: Boolean = true
-  private var _deterministic: Boolean = true
-
-  // This is a `var` instead of in the constructor for backward 
compatibility of this case class.
-  // TODO: revisit this case class in Spark 3.0, and narrow down the 
public surface.
-  private[sql] var nullableTypes: Option[Seq[Boolean]] = None
+trait UserDefinedFunction {
--- End diff --

Should we make this `sealed`? I'm not sure. Would any user ever extend this 
meaningfully? I kind of worry someone will start doing so; maybe they already 
subclass it in some cases though. Elsewhere it might help the compiler 
understand in `match` statements that there is only ever one type of UDF class 
to match on.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-11-29 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/23178

[SPARK-26216][SQL] Do not use case class as public API (UserDefinedFunction)

## What changes were proposed in this pull request?

It's a bad idea to use case class as public API, as it has a very wide 
surface. For example, the `copy` method, its fields, the companion object, etc.

For a particular case, `UserDefinedFunction`. It has a private constructor, 
and I believe we only want users to access a few methods:`apply`, `nullable`, 
`asNonNullable`, etc.

However, all its fields, and `copy` method, and the companion object are 
public unexpectedly. As a result, we made many tricks to work around the binary 
compatibility issues.

This PR proposes to only make interfaces public, and hide implementations 
behind with a private class. Now `UserDefinedFunction` is a pure trait, and the 
concrete implementation is `SparkUserDefinedFunction`, which is private.

This is the first PR to go with this direction. If it's accepted, I'll 
create a umbrella JIRA and fix all the public case classes.

## How was this patch tested?

existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark udf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23178


commit 700334f3b14cfe88d6141c8a99ec339ec7a16afc
Author: Wenchen Fan 
Date:   2018-11-29T13:38:51Z

Do not use case class as public API (UserDefinedFunction)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org