[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238113456 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- Got it. Thank you, @HyukjinKwon , @srowen , @cloud-fan . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23178 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238083055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- It's not a new API anyway, it will be weird to change since to 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238062995 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- I'd go ahead and leave the Since version. The API is essentially unchanged, though there are some marginal breaking compile time changes. But same is true of many things we are changing in 3.0. I've tagged the JIRA with `release-notes` and will add a blurb about the change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238057361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- yea actually I was wondering about the same thing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238007506 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- Is it better to change it to `@Stable` with `@since 3.0.0`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r238007219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable --- End diff -- I'm +1 for this PR, but I'm just wondering if this `@Stable` tag with `@since 1.3.0` tag is valid or not here. Previous case class was stable until 2.4.x and new trait will be stable since 3.0. But, the stability is broken at 3.0.0 once. Did I understand correctly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r237522717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable -case class UserDefinedFunction protected[sql] ( -f: AnyRef, -dataType: DataType, -inputTypes: Option[Seq[DataType]]) { - - private var _nameOption: Option[String] = None - private var _nullable: Boolean = true - private var _deterministic: Boolean = true - - // This is a `var` instead of in the constructor for backward compatibility of this case class. - // TODO: revisit this case class in Spark 3.0, and narrow down the public surface. - private[sql] var nullableTypes: Option[Seq[Boolean]] = None +trait UserDefinedFunction { --- End diff -- good idea! though I'm not sure if `sealed` works for Java. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r237521153 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable -case class UserDefinedFunction protected[sql] ( -f: AnyRef, -dataType: DataType, -inputTypes: Option[Seq[DataType]]) { - - private var _nameOption: Option[String] = None - private var _nullable: Boolean = true - private var _deterministic: Boolean = true - - // This is a `var` instead of in the constructor for backward compatibility of this case class. - // TODO: revisit this case class in Spark 3.0, and narrow down the public surface. - private[sql] var nullableTypes: Option[Seq[Boolean]] = None +trait UserDefinedFunction { /** * Returns true when the UDF can return a nullable value. * * @since 2.3.0 */ - def nullable: Boolean = _nullable + def nullable: Boolean /** * Returns true iff the UDF is deterministic, i.e. the UDF produces the same output given the same * input. * * @since 2.3.0 */ - def deterministic: Boolean = _deterministic + def deterministic: Boolean /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ @scala.annotation.varargs - def apply(exprs: Column*): Column = { + def apply(exprs: Column*): Column + + /** + * Updates UserDefinedFunction with a given name. + * + * @since 2.3.0 + */ + def withName(name: String): UserDefinedFunction + + /** + * Updates UserDefinedFunction to non-nullable. + * + * @since 2.3.0 + */ + def asNonNullable(): UserDefinedFunction + + /** + * Updates UserDefinedFunction to nondeterministic. + * + * @since 2.3.0 + */ + def asNondeterministic(): UserDefinedFunction +} + +private[sql] case class SparkUserDefinedFunction( +f: AnyRef, +dataType: DataType, +inputTypes: Option[Seq[DataType]], +nullableTypes: Option[Seq[Boolean]], +name: Option[String] = None, +nullable: Boolean = true, +deterministic: Boolean = true) extends UserDefinedFunction { + + @scala.annotation.varargs + override def apply(exprs: Column*): Column = { // TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()` // and `nullableTypes` is always set. -if (nullableTypes.isEmpty) { - nullableTypes = Some(ScalaReflection.getParameterTypeNullability(f)) -} if (inputTypes.isDefined) { assert(inputTypes.get.length == nullableTypes.get.length) } +val inputsNullSafe = if (nullableTypes.isEmpty) { --- End diff -- You can use `getOrElse` here and even inline this into the call below, but I don't really care. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23178#discussion_r237522078 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType * @since 1.3.0 */ @Stable -case class UserDefinedFunction protected[sql] ( -f: AnyRef, -dataType: DataType, -inputTypes: Option[Seq[DataType]]) { - - private var _nameOption: Option[String] = None - private var _nullable: Boolean = true - private var _deterministic: Boolean = true - - // This is a `var` instead of in the constructor for backward compatibility of this case class. - // TODO: revisit this case class in Spark 3.0, and narrow down the public surface. - private[sql] var nullableTypes: Option[Seq[Boolean]] = None +trait UserDefinedFunction { --- End diff -- Should we make this `sealed`? I'm not sure. Would any user ever extend this meaningfully? I kind of worry someone will start doing so; maybe they already subclass it in some cases though. Elsewhere it might help the compiler understand in `match` statements that there is only ever one type of UDF class to match on. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23178 [SPARK-26216][SQL] Do not use case class as public API (UserDefinedFunction) ## What changes were proposed in this pull request? It's a bad idea to use case class as public API, as it has a very wide surface. For example, the `copy` method, its fields, the companion object, etc. For a particular case, `UserDefinedFunction`. It has a private constructor, and I believe we only want users to access a few methods:`apply`, `nullable`, `asNonNullable`, etc. However, all its fields, and `copy` method, and the companion object are public unexpectedly. As a result, we made many tricks to work around the binary compatibility issues. This PR proposes to only make interfaces public, and hide implementations behind with a private class. Now `UserDefinedFunction` is a pure trait, and the concrete implementation is `SparkUserDefinedFunction`, which is private. This is the first PR to go with this direction. If it's accepted, I'll create a umbrella JIRA and fix all the public case classes. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark udf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23178 commit 700334f3b14cfe88d6141c8a99ec339ec7a16afc Author: Wenchen Fan Date: 2018-11-29T13:38:51Z Do not use case class as public API (UserDefinedFunction) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org