[GitHub] spark issue #20141: [SPARK-22945][SQL] add java UDF APIs in the functions ob...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20141 **[Test build #85658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85658/testReport)** for PR 20141 at commit [`711c027`](https://github.com/apache/spark/commit/711c0278a6d77472761414692c987d256293467e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85669/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85669/testReport)** for PR 20148 at commit [`93e1d64`](https://github.com/apache/spark/commit/93e1d648d56079dcc8bd28bafa7dad4cd9d2dfeb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85656/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20146 **[Test build #85656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85656/testReport)** for PR 20146 at commit [`bb990f1`](https://github.com/apache/spark/commit/bb990f1a6511d8ce20f4fff254dfe0ff43262a10). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85668/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85668/testReport)** for PR 20148 at commit [`92251aa`](https://github.com/apache/spark/commit/92251aa5d518083ae4016f307b48a31d67b87067). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85669/testReport)** for PR 20148 at commit [`93e1d64`](https://github.com/apache/spark/commit/93e1d648d56079dcc8bd28bafa7dad4cd9d2dfeb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85667/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159592335 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala --- @@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite with BeforeAndAfter with Bef "1", "dummy", "dummy", Seq[(String, String)](), driverPod, Map[String, Int]()) assert(executor.getSpec.getInitContainers.size() === 1) - assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName - === "secret1-volume") -assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0) - .getMountPath === "/var/secret1") +assert(SecretVolumeUtils.containerHasVolume( --- End diff -- Done. https://github.com/apache/spark/pull/20148/commits/93e1d648d56079dcc8bd28bafa7dad4cd9d2dfeb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85667/testReport)** for PR 20148 at commit [`2e6f7db`](https://github.com/apache/spark/commit/2e6f7db4efd9b819eee4c5d6d0152638ba0931d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user hex108 commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159591760 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala --- @@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite with BeforeAndAfter with Bef "1", "dummy", "dummy", Seq[(String, String)](), driverPod, Map[String, Int]()) assert(executor.getSpec.getInitContainers.size() === 1) - assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName - === "secret1-volume") -assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0) - .getMountPath === "/var/secret1") +assert(SecretVolumeUtils.containerHasVolume( --- End diff -- Thanks! We also need check volumes' num in pod spec. + // check volume mounted. + assert(executor.getSpec.getVolumes.size() === 1) + assert(executor.getSpec.getVolumes.get(0).getSecret.getSecretName === "secret1") --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85668/testReport)** for PR 20148 at commit [`92251aa`](https://github.com/apache/spark/commit/92251aa5d518083ae4016f307b48a31d67b87067). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159590877 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala --- @@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite with BeforeAndAfter with Bef "1", "dummy", "dummy", Seq[(String, String)](), driverPod, Map[String, Int]()) assert(executor.getSpec.getInitContainers.size() === 1) - assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName - === "secret1-volume") -assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0) - .getMountPath === "/var/secret1") +assert(SecretVolumeUtils.containerHasVolume( --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159590636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // UDFToString private[this] def castToString(from: DataType): Any => Any = from match { case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes) +case StringType => buildCast[UTF8String](_, identity) case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d))) --- End diff -- yea, good idea. So, I'll add the builder class. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85667/testReport)** for PR 20148 at commit [`2e6f7db`](https://github.com/apache/spark/commit/2e6f7db4efd9b819eee4c5d6d0152638ba0931d1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user hex108 commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159590343 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala --- @@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite with BeforeAndAfter with Bef "1", "dummy", "dummy", Seq[(String, String)](), driverPod, Map[String, Int]()) assert(executor.getSpec.getInitContainers.size() === 1) - assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName - === "secret1-volume") -assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0) - .getMountPath === "/var/secret1") +assert(SecretVolumeUtils.containerHasVolume( --- End diff -- It might be better to change `None` at line 168 to `Some(secretBootstrap)` and check volumes' number to avoid regression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159590247 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val tz = ctx.addReferenceObj("timeZone", timeZone) (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" + case ar: ArrayType => +(c, evPrim, evNull) => { + val bufferTerm = ctx.freshName("bufferTerm") + val bufferClass = classOf[StringBuffer].getName + val writeArrayToBuffer = codegenWriteArrayToBuffer(ar, ctx) + s""" + |$bufferClass $bufferTerm = new $bufferClass(); + |if (!$evNull) { --- End diff -- ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159590150 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala --- @@ -28,20 +28,26 @@ private[spark] class MountSecretsBootstrap(secretNamesToMountPaths: Map[String, * * @param pod the pod into which the secret volumes are being added. * @param container the container into which the secret volumes are being mounted. + * @param addNewVolumes whether to add new secret volumes for the secrets. * @return the updated pod and container with the secrets mounted. */ - def mountSecrets(pod: Pod, container: Container): (Pod, Container) = { + def mountSecrets( --- End diff -- Good point. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20147 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20147 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85659/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader's pare...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20145 **[Test build #85666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85666/testReport)** for PR 20145 at commit [`fea2490`](https://github.com/apache/spark/commit/fea2490027c611fb9be61ea33ad4de962ccf2ffa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20147 **[Test build #85659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85659/testReport)** for PR 20147 at commit [`c5e835e`](https://github.com/apache/spark/commit/c5e835eb317d0b6970daef0ffc1214576874e8c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159589454 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val tz = ctx.addReferenceObj("timeZone", timeZone) (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" + case ar: ArrayType => +(c, evPrim, evNull) => { + val bufferTerm = ctx.freshName("bufferTerm") + val bufferClass = classOf[StringBuffer].getName --- End diff -- Since `StringBuffer` is used in `stringExpressions` and `regexExpressions`, I just used this along with them. But, I'll update. (BTW, any reason to use `StringBuffer` in `stringExpressions` and `regexExpressions`?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20145#discussion_r159589347 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -42,4 +47,41 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(hiveConf("foo") === "bar") } } + + test("ChildFirstURLClassLoader's parent is null") { --- End diff -- I think we can remove it. The below tests case would fail without your PR, that proves your patch already. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159589202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // UDFToString private[this] def castToString(from: DataType): Any => Any = from match { case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes) +case StringType => buildCast[UTF8String](_, identity) case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d))) --- End diff -- I think this can also simplify the codegen part, as we have some duplicated code to avoid this conversion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159589126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // UDFToString private[this] def castToString(from: DataType): Any => Any = from match { case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes) +case StringType => buildCast[UTF8String](_, identity) case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d))) --- End diff -- we may covert a string to `UTF8String` and then convert it back, which is inefficient. I think we should create a special `StringBuilder` for `UTF8String`, e.g. ``` class UTF8StringBuilder { public void append(UTF8String str) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159588871 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala --- @@ -28,20 +28,26 @@ private[spark] class MountSecretsBootstrap(secretNamesToMountPaths: Map[String, * * @param pod the pod into which the secret volumes are being added. * @param container the container into which the secret volumes are being mounted. + * @param addNewVolumes whether to add new secret volumes for the secrets. * @return the updated pod and container with the secrets mounted. */ - def mountSecrets(pod: Pod, container: Container): (Pod, Container) = { + def mountSecrets( --- End diff -- Can we separate this method into `addSecretVolumes` and `mountSecrets`? The former would just need the pod argument, and the latter would take the container as argument. That way, we could probably separate this out better and it would make it more readable than the branching. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85664/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85664/testReport)** for PR 20148 at commit [`9be26a8`](https://github.com/apache/spark/commit/9be26a857164e815594941adff6a8d2001d69001). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20149: [SPARK-22771][SQL] Add a missing return statement in Con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20149 **[Test build #85665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85665/testReport)** for PR 20149 at commit [`2b385ac`](https://github.com/apache/spark/commit/2b385ac66e33bb18b651ddd6489bd691074960a6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159588468 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala --- @@ -28,20 +28,26 @@ private[spark] class MountSecretsBootstrap(secretNamesToMountPaths: Map[String, * * @param pod the pod into which the secret volumes are being added. * @param container the container into which the secret volumes are being mounted. + * @param addNewVolumes whether to add new secret volumes for the secrets. --- End diff -- Agreed. I didn't separate it out because we will touch this code as part of refactoring the steps code anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159588521 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val tz = ctx.addReferenceObj("timeZone", timeZone) (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" + case ar: ArrayType => +(c, evPrim, evNull) => { + val bufferTerm = ctx.freshName("bufferTerm") + val bufferClass = classOf[StringBuffer].getName --- End diff -- why not use `StringBuilder`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159588472 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val tz = ctx.addReferenceObj("timeZone", timeZone) (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" + case ar: ArrayType => +(c, evPrim, evNull) => { + val bufferTerm = ctx.freshName("bufferTerm") + val bufferClass = classOf[StringBuffer].getName + val writeArrayToBuffer = codegenWriteArrayToBuffer(ar, ctx) + s""" + |$bufferClass $bufferTerm = new $bufferClass(); + |if (!$evNull) { --- End diff -- looking at the document of `CastFunction`, the `evNull` is actually `resultIsNull`, it's not about input but about result, so we should not check it here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20149: [SPARK-22771][SQL] Add a missing return statement in Con...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20149 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20149: [SPARK-22771][SQL] Add a missing return statement...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/20149 [SPARK-22771][SQL] Add a missing return statement in Concat.checkInputDataTypes ## What changes were proposed in this pull request? This pr is a follow-up to fix a bug left in #19977. ## How was this patch tested? Added tests in `StringExpressionsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-22771-FOLLOWUP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20149.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20149 commit 2b385ac66e33bb18b651ddd6489bd691074960a6 Author: Takeshi Yamamuro Date: 2018-01-04T06:43:50Z Fix a bug --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/20148#discussion_r159588348 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala --- @@ -28,20 +28,26 @@ private[spark] class MountSecretsBootstrap(secretNamesToMountPaths: Map[String, * * @param pod the pod into which the secret volumes are being added. * @param container the container into which the secret volumes are being mounted. + * @param addNewVolumes whether to add new secret volumes for the secrets. --- End diff -- I think this problem arose because we're conflating two things here - adding secret volumes (which are pod-scoped) and adding volume-mounts (which are container-scoped). I think we should separate these out. The branching may work for now, but we should have a future work item to separate these out. cc/ @mccheah --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @felixcheung Have you ever thought about initial num-executors? Actually, it is default 2 executors when you run spark on yarn. How can you make sure that this 2 executors have enougth cores for receivers at the begining? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159588015 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // UDFToString private[this] def castToString(from: DataType): Any => Any = from match { case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes) +case StringType => buildCast[UTF8String](_, identity) case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d))) case TimestampType => buildCast[Long](_, t => UTF8String.fromString(DateTimeUtils.timestampToString(t, timeZone))) +case ar: ArrayType => + buildCast[ArrayData](_, array => { +val res = new StringBuilder +res.append("[") +if (array.numElements > 0) { --- End diff -- yea, ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159587926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // UDFToString private[this] def castToString(from: DataType): Any => Any = from match { case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes) +case StringType => buildCast[UTF8String](_, identity) case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d))) case TimestampType => buildCast[Long](_, t => UTF8String.fromString(DateTimeUtils.timestampToString(t, timeZone))) +case ar: ArrayType => + buildCast[ArrayData](_, array => { +val res = new StringBuilder +res.append("[") +if (array.numElements > 0) { --- End diff -- Actually I prefer your previous code style ``` val toStringFunc = castToString(ar.elementType) if (array.numElements > 0) { res.append(toStringFunc(array.get(i, et))) } var i = 1 while (i < array.numElements) { res.append(", ") res.append(element...) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20148 **[Test build #85664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85664/testReport)** for PR 20148 at commit [`9be26a8`](https://github.com/apache/spark/commit/9be26a857164e815594941adff6a8d2001d69001). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/20145#discussion_r159587396 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -42,4 +47,41 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(hiveConf("foo") === "bar") } } + + test("ChildFirstURLClassLoader's parent is null") { --- End diff -- just indicates that if we don't handle child first classloader's parent, we will get caught in exception. if it bothers, i can del it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...
GitHub user liyinan926 opened a pull request: https://github.com/apache/spark/pull/20148 [SPARK-22953][K8S] Avoids adding duplicated secret volumes when init-container is used ## What changes were proposed in this pull request? User-specified secrets are mounted into both the main container and init-container (when it is used) in a Spark driver/executor pod, using the `MountSecretsBootstrap`. Because `MountSecretsBootstrap` always adds new secret volumes for the secrets to the pod, the same secret volumes get added twice, one when mounting the secrets to the main container, and the other when mounting the secrets to the init-container. This PR fixes the issue by introducing a boolean flag to `MountSecretsBootstrap.mountSecrets` that tells whether to add new secret volumes. The flag is set to `true` when mounting secrets to the main container and `false` when mounting to the init-container. Ref: https://github.com/apache-spark-on-k8s/spark/issues/594. ## How was this patch tested? Unit tested. @vanzin @jiangxb1987 @ueshin This is to fix a critical bug in 2.3. PTAL. Thanks! @hex108 @foxish @kimoonkim You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyinan926/spark-k8s branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20148.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20148 commit 9be26a857164e815594941adff6a8d2001d69001 Author: Yinan Li Date: 2018-01-04T06:20:01Z [SPARK-22953][K8S] Avoids adding duplicated secret volumes when init-container is used --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader's pare...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20145 **[Test build #85663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85663/testReport)** for PR 20145 at commit [`e1a9f2e`](https://github.com/apache/spark/commit/e1a9f2e12d17aeb3f7d20ed6695424c2c2765d16). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20145#discussion_r159586966 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -42,4 +47,41 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(hiveConf("foo") === "bar") } } + + test("ChildFirstURLClassLoader's parent is null") { --- End diff -- what is this test trying to test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20078 hmm, I didn't know that was changed actually (SPARK-13723) But it seems to me `spark.streaming.dynamicAllocation.minExecutors` is still a valid approach. To match the non-streaming behavior would be a bigger change that I'd agree not sure we should take at this point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/20145#discussion_r159585279 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -42,4 +47,29 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(hiveConf("foo") === "bar") } } + + test("ChildFirstURLClassLoader's parent is null") { +val conf = new SparkConf +val contextClassLoader = Thread.currentThread().getContextClassLoader +val loader = new FakeChildFirstURLClassLoader(Array(), contextClassLoader) +Thread.currentThread().setContextClassLoader(loader) +intercept[IllegalArgumentException]( + HiveUtils.newClientForMetadata(conf, SparkHadoopUtil.newConfiguration(conf))) +Thread.currentThread().setContextClassLoader(contextClassLoader) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/20145#discussion_r159585110 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -42,4 +47,29 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(hiveConf("foo") === "bar") } } + + test("ChildFirstURLClassLoader's parent is null") { +val conf = new SparkConf +val contextClassLoader = Thread.currentThread().getContextClassLoader +val loader = new FakeChildFirstURLClassLoader(Array(), contextClassLoader) +Thread.currentThread().setContextClassLoader(loader) +intercept[IllegalArgumentException]( + HiveUtils.newClientForMetadata(conf, SparkHadoopUtil.newConfiguration(conf))) +Thread.currentThread().setContextClassLoader(contextClassLoader) + } + + test("ChildFirstURLClassLoader's parent is null, get spark classloader instead") { +val conf = new SparkConf +val contextClassLoader = Thread.currentThread().getContextClassLoader +val loader = new ChildFirstURLClassLoader(Array(), contextClassLoader) +Thread.currentThread().setContextClassLoader(loader) +HiveUtils.newClientForMetadata(conf, SparkHadoopUtil.newConfiguration(conf)) +Thread.currentThread().setContextClassLoader(contextClassLoader) + } } + +/** + * A Fake [[ChildFirstURLClassLoader]] used for test + */ +private[spark] class FakeChildFirstURLClassLoader(urls: Array[URL], parent: ClassLoader) --- End diff -- inheritation maketh case match --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20135: [SPARK-22937][SQL] SQL elt output binary for binary inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20135 **[Test build #85661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85661/testReport)** for PR 20135 at commit [`1ac512b`](https://github.com/apache/spark/commit/1ac512b26c2726904dbffdb447f538edfd686535). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20078 Originally in Spark dynamic allocation, "spark.executor.instances" and dynamic allocation conf cannot be co-existed, if "spark.executor.instances" is set, dynamic allocation will not be enabled. But this behavior is changed after 2.0. I think here for streaming dynamic allocation, we'd better keep it consistent with Spark dynamic allocation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #85662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85662/testReport)** for PR 16578 at commit [`42067c7`](https://github.com/apache/spark/commit/42067c7c91c3fc72d57050d501bf39f1fd777bae). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20078 not saying about this change, but I've use streaming dynamic allocation quite a bit back in the day. but in this case I think simply is to set `spark.streaming.dynamicAllocation.minExecutors` instead of `num-executors` as pointed out above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20024 **[Test build #85660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85660/testReport)** for PR 20024 at commit [`1d623f8`](https://github.com/apache/spark/commit/1d623f8b0a53d2152b942ce3adf631a8c3dd1a12). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20024 ok, all the comments addressed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159584638 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -608,6 +723,22 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val tz = ctx.addReferenceObj("timeZone", timeZone) (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" + case ar: ArrayType => +val bufferClass = classOf[StringBuffer].getName +val buffer = ctx.addMutableState(bufferClass, "buffer", v => s"$v = new $bufferClass();") +val writeArrayToBuffer = codegenWriteArrayToBuffer(ar, buffer, ctx) +val arrayToStringCode = + s""" + |if (!${ev.isNull}) { --- End diff -- ya, I missed. Dropped `ev` now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159584656 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -597,7 +605,114 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String """ } - private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { + private[this] def writeElemToBufferCode( + dataType: DataType, + buffer: String, + elemTerm: String, + ctx: CodegenContext): String = dataType match { +case BinaryType => s"$buffer.append(new String($elemTerm))" +case StringType => s"$buffer.append(new String($elemTerm.getBytes()))" +case DateType => s"""$buffer.append( + org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($elemTerm))""" +case TimestampType => s"""$buffer.append( + org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($elemTerm))""" +case map: MapType => s"${codegenWriteMapToBuffer(map, buffer, ctx)}($elemTerm)" +case ar: ArrayType => s"${codegenWriteArrayToBuffer(ar, buffer, ctx)}($elemTerm)" +case st: StructType => s"${codegenWriteStructToBuffer(st, buffer, ctx)}($elemTerm)" --- End diff -- ok, dropped. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20144: [SPARK-21475][CORE][2nd attempt] Change to use NIO's Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20144 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20144: [SPARK-21475][CORE][2nd attempt] Change to use NIO's Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20144 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85651/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20146#discussion_r159584417 --- Diff: R/pkg/tests/fulltests/test_mllib_classification.R --- @@ -348,12 +348,12 @@ test_that("spark.mlp", { # Test random seed # default seed - model <- spark.mlp(df, label ~ features, layers = c(4, 5, 4, 3), maxIter = 10) + model <- spark.mlp(df, label ~ features, layers = c(4, 5, 4, 3), maxIter = 100) --- End diff -- can you check if the run time increases significantly? this is an issue before - see SPARK-21693 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20144: [SPARK-21475][CORE][2nd attempt] Change to use NIO's Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20144 **[Test build #85651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85651/testReport)** for PR 20144 at commit [`277b801`](https://github.com/apache/spark/commit/277b801b3ccee95d8f30c02b1ba8693ab39f302a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20129: [SPARK-22933][SPARKR] R Structured Streaming API ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20129 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20129 merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20024 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20024 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85652/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20024 **[Test build #85652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85652/testReport)** for PR 20024 at commit [`a46a9a7`](https://github.com/apache/spark/commit/a46a9a76487151677d011ded7c06bee9d46c35d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20137#discussion_r159582683 --- Diff: python/pyspark/sql/catalog.py --- @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()): >>> _ = spark.udf.register("stringLengthInt", len, IntegerType()) >>> spark.sql("SELECT stringLengthInt('test')").collect() [Row(stringLengthInt(test)=4)] + +>>> import random +>>> from pyspark.sql.functions import udf +>>> from pyspark.sql.types import IntegerType, StringType +>>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic() +>>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType()) +>>> spark.sql("SELECT random_udf()").collect() # doctest: +SKIP +[Row(random_udf()=u'82')] +>>> spark.range(1).select(newRandom_udf()).collect() # doctest: +SKIP +[Row(random_udf()=u'62')] """ -udf = UserDefinedFunction(f, returnType=returnType, name=name, - evalType=PythonEvalType.SQL_BATCHED_UDF) + +# This is to check whether the input function is a wrapped/native UserDefinedFunction +if hasattr(f, 'asNondeterministic'): +udf = UserDefinedFunction(f.func, returnType=returnType, name=name, + evalType=PythonEvalType.SQL_BATCHED_UDF, --- End diff -- seems we can support it by just changing `evalType=PythonEvalType.SQL_BATCHED_UDF` to `evalType=f.evalType` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20137#discussion_r159582570 --- Diff: python/pyspark/sql/catalog.py --- @@ -255,9 +255,26 @@ def registerFunction(self, name, f, returnType=StringType()): >>> _ = spark.udf.register("stringLengthInt", len, IntegerType()) >>> spark.sql("SELECT stringLengthInt('test')").collect() [Row(stringLengthInt(test)=4)] + +>>> import random +>>> from pyspark.sql.functions import udf +>>> from pyspark.sql.types import IntegerType, StringType +>>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic() +>>> newRandom_udf = spark.catalog.registerFunction("random_udf", random_udf, StringType()) +>>> spark.sql("SELECT random_udf()").collect() # doctest: +SKIP +[Row(random_udf()=u'82')] +>>> spark.range(1).select(newRandom_udf()).collect() # doctest: +SKIP +[Row(random_udf()=u'62')] """ -udf = UserDefinedFunction(f, returnType=returnType, name=name, - evalType=PythonEvalType.SQL_BATCHED_UDF) + +# This is to check whether the input function is a wrapped/native UserDefinedFunction +if hasattr(f, 'asNondeterministic'): +udf = UserDefinedFunction(f.func, returnType=returnType, name=name, + evalType=PythonEvalType.SQL_BATCHED_UDF, --- End diff -- cc @ueshin @icexelloss , shall we support register pandas UDF here too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85657/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20137 **[Test build #85657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85657/testReport)** for PR 20137 at commit [`2482e6b`](https://github.com/apache/spark/commit/2482e6bcdaf92a78ae6b043a859e10140a273a18). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20137 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20139 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSui...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20147#discussion_r159582192 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -85,6 +90,43 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { new File(tmpDataDir, name).getCanonicalPath } + private def getFileFromUrl(urlString: String, targetDir: String, filename: String): Boolean = { +val conf = new SparkConf +val securityManager = new SecurityManager(conf) +val hadoopConf = new Configuration + +val outDir = new File(targetDir) +if (!outDir.exists()) { + outDir.mkdirs() +} + +try { + val result = Utils.doFetchFile(urlString, outDir, filename, conf, securityManager, hadoopConf) --- End diff -- `Utils.doFetchFile` is a little overkill for this case, maybe we can just implement a simple downloading function with java. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20139 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85648/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20139 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20139 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20139 **[Test build #85648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85648/testReport)** for PR 20139 at commit [`df75bff`](https://github.com/apache/spark/commit/df75bffa1c27c2b70495a1b2c8988f3a1c979918). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20147 **[Test build #85659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85659/testReport)** for PR 20147 at commit [`c5e835e`](https://github.com/apache/spark/commit/c5e835eb317d0b6970daef0ffc1214576874e8c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20147 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user VigneshMohan1 commented on the issue: https://github.com/apache/spark/pull/16578 @JoshRosen Can we make this pr to 2.3.0? A lot of people are interested in this and this will boost performance in reading parquet nested fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20139 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20141: [SPARK-22945][SQL] add java UDF APIs in the functions ob...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20141 **[Test build #85658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85658/testReport)** for PR 20141 at commit [`711c027`](https://github.com/apache/spark/commit/711c0278a6d77472761414692c987d256293467e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20147 OK to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20147 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 @marmbrus can we start the review process ? so that it can make it for the next release ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20147 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSui...
GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/20147 [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite should succeed on platforms that don't have wget ## What changes were proposed in this pull request? Modified HiveExternalCatalogVersionsSuite.scala to use Utils.doFetchFile to download different versions of Spark binaries rather than launching wget as an external process. On platforms that don't have wget installed, this suite fails with an error. @cloud-fan : would you like to check this change? ## How was this patch tested? 1) test-only of HiveExternalCatalogVersionsSuite on several platforms. Tested bad mirror, read timeout, and redirects. 2) ./dev/run-tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/bersprockets/spark SPARK-22940-alt Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20147.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20147 commit 5fe679f3271692729e8b9a2a9351725a80fcf8f6 Author: Bruce Robbins Date: 2018-01-03T22:32:30Z Initial commit commit 92e82d484a30a171141d2cdd1f800254fe33ceaf Author: Bruce Robbins Date: 2018-01-03T23:12:09Z Remove test log messages commit c5e835eb317d0b6970daef0ffc1214576874e8c6 Author: Bruce Robbins Date: 2018-01-04T02:39:03Z The 2 new methods should be private --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20137 **[Test build #85657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85657/testReport)** for PR 20137 at commit [`2482e6b`](https://github.com/apache/spark/commit/2482e6bcdaf92a78ae6b043a859e10140a273a18). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20146#discussion_r159580805 --- Diff: R/pkg/tests/fulltests/test_mllib_classification.R --- @@ -313,7 +313,7 @@ test_that("spark.mlp", { # Test predict method --- End diff -- Actually I think we may remove this test `Test predict method`. Seems to me, with the `tol = 0.5`, the prediction may not be very meaningful. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > However, I am -1 on merging a change this large after branch cut. It's disappointing, but I agree we can't merge a change this large into a branch cut. It will have to wait for 2.3.1 at the earliest or the next major release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20141: [SPARK-22945][SQL] add java UDF APIs in the functions ob...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20141 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20137 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20137#discussion_r159580038 --- Diff: python/pyspark/sql/catalog.py --- @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName): @ignore_unicode_prefix @since(2.0) def registerFunction(self, name, f, returnType=StringType()): -"""Registers a python function (including lambda function) as a UDF +"""Registers a Python function (including lambda function) or a wrapped/native UDF --- End diff -- SGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20137#discussion_r159579886 --- Diff: python/pyspark/sql/catalog.py --- @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName): @ignore_unicode_prefix @since(2.0) def registerFunction(self, name, f, returnType=StringType()): -"""Registers a python function (including lambda function) as a UDF +"""Registers a Python function (including lambda function) or a wrapped/native UDF --- End diff -- BTW, to be honest, I remember I gave several quick tries to get rid of the wrapper but keep the docstring correctly at that time but I failed to make a good alternative. Might be good to try if there is a claver way to get rid of the wrapper but keep the doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20141: [SPARK-22945][SQL] add java UDF APIs in the funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20141#discussion_r159579701 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3461,13 +3485,172 @@ object functions { if (nullable) udf else udf.asNonNullable() } + // + // Java UDF functions + // + + /** + * Defines a Java UDF0 instance as user-defined function (UDF). + * The caller must specify the output data type, and there is no automatic input type coercion. + * By default the returned UDF is deterministic, to change it to nondeterministic, call the + * API `UserDefinedFunction.asNondeterministic()`. + * + * @group udf_funcs + * @since 2.3.0 + */ + def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction = { +val func = f.asInstanceOf[UDF0[Any]].call() +UserDefinedFunction(() => func, returnType, inputTypes = None) + } + + /** + * Defines a Java UDF1 instance as user-defined function (UDF). + * The caller must specify the output data type, and there is no automatic input type coercion. + * By default the returned UDF is deterministic, to change it to nondeterministic, call the --- End diff -- Nit: `, to` -> `. To` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20137#discussion_r159579617 --- Diff: python/pyspark/sql/catalog.py --- @@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName): @ignore_unicode_prefix @since(2.0) def registerFunction(self, name, f, returnType=StringType()): -"""Registers a python function (including lambda function) as a UDF +"""Registers a Python function (including lambda function) or a wrapped/native UDF --- End diff -- Another idea just in case it helps: ``` Registers a Python function as a UDF or a user defined function. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org