[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88166/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20649 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20649 **[Test build #88166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88166/testReport)** for PR 20649 at commit [`78e037d`](https://github.com/apache/spark/commit/78e037d06d2972062cac14fe8b503e0802a4c32e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r173692751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,10 +1193,18 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() +plan.transformAllExpressions { --- End diff -- changed, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20782: [SPARK-23627][SQL] Provide isEmpty in DataSet
Github user goungoun commented on the issue: https://github.com/apache/spark/pull/20782 As unnecessary information is included, I closed this pull request. Please refer request #20800 instead of #20782. I am sorry for your inconvenience. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20799 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88168/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20799 **[Test build #88168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88168/testReport)** for PR 20799 at commit [`afd88e1`](https://github.com/apache/spark/commit/afd88e1a6585851f307413fd39a72167b9d8224c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20800 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20800 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20800: isEmpty in Dataset and its testSuite
GitHub user goungoun opened a pull request: https://github.com/apache/spark/pull/20800 isEmpty in Dataset and its testSuite ## What changes were proposed in this pull request? This PR adds isEmpty() in DataSet ## How was this patch tested? Unit tests added Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/goungoun/spark SPARK-23627 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20800.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20800 commit d3fc620e49ae07f6ebc0ee20976c12da49f2e0af Author: Goun NaDate: 2018-03-12T02:57:19Z isEmpty in Dataset and its testSuite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20796 Could you add more tests for all the invalid range `253-255` in `UTF8StringSuite`? @HyukjinKwon Could you trigger this test? Also, kindly pinging cuz this is probably your domain? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1459/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20799 **[Test build #88168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88168/testReport)** for PR 20799 at commit [`afd88e1`](https://github.com/apache/spark/commit/afd88e1a6585851f307413fd39a72167b9d8224c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20799 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/20799 [SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv. ## What changes were proposed in this pull request? In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors. To reproduce issue, user could start spark-shell like: ``` ./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf spark.yarn.appMasterEnv.SPARK_ABC=am_val ``` Then check executor env variables by ``` sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println) ``` We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables. ## How was this patch tested? Added UT and tested in local cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-23635 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20799.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20799 commit afd88e1a6585851f307413fd39a72167b9d8224c Author: jerryshaoDate: 2018-03-12T03:21:40Z AM env variable should not overwrite same name env variable set through spark.executorEnv. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1458/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20705 **[Test build #88167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88167/testReport)** for PR 20705 at commit [`159489c`](https://github.com/apache/spark/commit/159489c884cd381584ade0734d6e215a2a67103c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20705 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20757: [SPARK-23595][SQL] ValidateExternalType should su...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20757#discussion_r173683323 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1440,7 +1463,7 @@ case class ValidateExternalType(child: Expression, expected: DataType) Seq(classOf[java.math.BigDecimal], classOf[scala.math.BigDecimal], classOf[Decimal]) .map(cls => s"$obj instanceof ${cls.getName}").mkString(" || ") case _: ArrayType => -s"$obj instanceof ${classOf[Seq[_]].getName} || $obj.getClass().isArray()" +s"$obj.getClass().isArray() || $obj instanceof ${classOf[Seq[_]].getName}" --- End diff -- Based on the @rednaxelafx 's comment, I changed this codegen, too. https://github.com/apache/spark/pull/20757#discussion_r173006281 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20649 **[Test build #88166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88166/testReport)** for PR 20649 at commit [`78e037d`](https://github.com/apache/spark/commit/78e037d06d2972062cac14fe8b503e0802a4c32e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user xysun commented on the issue: https://github.com/apache/spark/pull/20649 Latest code review fixes pushed. cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r173682841 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- Should it be `basestring` or `str` as previously here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20649: [SPARK-23462][SQL] improve missing field error me...
Github user xysun commented on a diff in the pull request: https://github.com/apache/spark/pull/20649#discussion_r173682838 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -271,7 +271,9 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru */ def apply(name: String): StructField = { nameToField.getOrElse(name, - throw new IllegalArgumentException(s"""Field "$name" does not exist.""")) + throw new IllegalArgumentException( +s"""Field "$name" does not exist. + |Available fields: ${fieldNamesSet.mkString(", ")}""".stripMargin)) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20797#discussion_r173682156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -266,8 +266,26 @@ case class Invoke( override def nullable: Boolean = targetObject.nullable || needNullCheck || returnNullable override def children: Seq[Expression] = targetObject +: arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +val obj = targetObject.eval(input) +val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) +val argClasses = CallMethodViaReflection.expressionJavaClasses(arguments) +val method = obj.getClass.getDeclaredMethod(functionName, argClasses : _*) --- End diff -- We should also check if `obj` is `null` too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20797#discussion_r173682087 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -266,8 +266,26 @@ case class Invoke( override def nullable: Boolean = targetObject.nullable || needNullCheck || returnNullable override def children: Seq[Expression] = targetObject +: arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +val obj = targetObject.eval(input) +val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) +val argClasses = CallMethodViaReflection.expressionJavaClasses(arguments) +val method = obj.getClass.getDeclaredMethod(functionName, argClasses : _*) +if (needNullCheck && args.exists(_ == null)) { + // return null if one of arguments is null + null +} else { + val ret = method.invoke(obj, args: _*) + + if (CodeGenerator.defaultValue(dataType) == "null") { +ret + } else { +// cast a primitive value using Boxed class +val boxedClass = CallMethodViaReflection.typeBoxedJavaMapping(dataType) +boxedClass.cast(ret) --- End diff -- Do we need to do explicitly cast? Doesn't Scala cast it if it's needed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20797 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20797 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88164/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20797 **[Test build #88164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88164/testReport)** for PR 20797 at commit [`3dbea12`](https://github.com/apache/spark/commit/3dbea1252f8e43bf0f9a29d515b17935b58582e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20798 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20798 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88165/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20798 **[Test build #88165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88165/testReport)** for PR 20798 at commit [`5ec810a`](https://github.com/apache/spark/commit/5ec810a7c36691df1877ffc11e6f06392d438485). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20798 **[Test build #88165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88165/testReport)** for PR 20798 at commit [`5ec810a`](https://github.com/apache/spark/commit/5ec810a7c36691df1877ffc11e6f06392d438485). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20798 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88163/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20763 **[Test build #88163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88163/testReport)** for PR 20763 at commit [`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20798 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20798 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...
Github user mstewart141 commented on the issue: https://github.com/apache/spark/pull/20798 [WIP] cc @HyukjinKwon ð i'd love to run tests here to make sure i haven't broken something. i will update pr with new tests once i set up testing better on my local box. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20798: [SPARK-23645][PYTHON] Allow python udfs to be cal...
GitHub user mstewart141 opened a pull request: https://github.com/apache/spark/pull/20798 [SPARK-23645][PYTHON] Allow python udfs to be called with keyword arguments ## [WIP] ## What changes were proposed in this pull request? Currently one can not pass keyword arguments in python UDFs. This patch allows keyword arguments to be mixed arbitrarily with positional arguments, as seen in normal python functions. UDFs accepting an arbitrary (undefined) number of columns are a different matter, and not addressed here. ## How was this patch tested? I will add unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mstewart141/spark udfkw Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20798.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20798 commit 5ec810a7c36691df1877ffc11e6f06392d438485 Author: Michael (Stu) StewartDate: 2018-03-11T20:38:29Z [SPARK-23645][PYTHON] Allow python udfs to be called with keyword arguments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20331 Please don't format [this comment](https://github.com/apache/spark/pull/20331#pullrequestreview-90358792) during next update, @gatorsmile . Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20797 **[Test build #88164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88164/testReport)** for PR 20797 at commit [`3dbea12`](https://github.com/apache/spark/commit/3dbea1252f8e43bf0f9a29d515b17935b58582e8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20797 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Thank you so much for review and approval, @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20797 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1457/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20687#discussion_r173667671 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.rules.Rule /** -* push down operations into [[CreateNamedStructLike]]. -*/ -object SimplifyCreateStructOps extends Rule[LogicalPlan] { - override def apply(plan: LogicalPlan): LogicalPlan = { -plan.transformExpressionsUp { - // push down field extraction + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions. + */ +object SimplifyExtractValueOps extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p => +p.transformExpressionsUp { --- End diff -- Sorry for late response, @gatorsmile . These are expression-level optimization rules. If the original expressions exists in `SELECT`, `GROUP BY`, and `HAVING`, those are simplified in the same way together. Do you have any concerning cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/20797 [SPARK-23583][SQL] Invoke should support interpreted execution ## What changes were proposed in this pull request? This pr added interpreted execution for `Invoke`. ## How was this patch tested? Added tests in `ObjectExpressionsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-28583 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20797.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20797 commit 3dbea1252f8e43bf0f9a29d515b17935b58582e8 Author: Kazuaki IshizakiDate: 2018-03-11T20:07:32Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1456/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20763 **[Test build #88163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88163/testReport)** for PR 20763 at commit [`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20763 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r173665102 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigPropertiesStep.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.submit.steps + +import java.io.StringWriter +import java.util.Properties + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec + +/** + * Create a config map with the driver configuration and attach it to the pod. This needs to + * come at the end of the driver configuration so that all modifications to the Spark config + * are reflected in the generated config map. + */ +private[spark] class DriverConfigPropertiesStep(resourceNamePrefix: String) +extends DriverConfigurationStep { + + override def configureDriver(spec: KubernetesDriverSpec): KubernetesDriverSpec = { +val configMapName = s"$resourceNamePrefix-driver-conf-map" +val configMap = buildConfigMap(configMapName, spec.driverSparkConf) --- End diff -- I will also include `--conf spark.driver.extraJavaOptions=` in the spark-submit launched in the driver --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20763 **[Test build #88162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88162/testReport)** for PR 20763 at commit [`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88162/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/20683 @jerryshao In Spark Streaming, I think ```.tmp``` is used as a suffix to indicate that the object was a file, although I do not know if this is universal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r173663339 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigPropertiesStep.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.submit.steps + +import java.io.StringWriter +import java.util.Properties + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec + +/** + * Create a config map with the driver configuration and attach it to the pod. This needs to + * come at the end of the driver configuration so that all modifications to the Spark config + * are reflected in the generated config map. + */ +private[spark] class DriverConfigPropertiesStep(resourceNamePrefix: String) +extends DriverConfigurationStep { + + override def configureDriver(spec: KubernetesDriverSpec): KubernetesDriverSpec = { +val configMapName = s"$resourceNamePrefix-driver-conf-map" +val configMap = buildConfigMap(configMapName, spec.driverSparkConf) --- End diff -- I agree in that it should be in any order. I will restructure it as such. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r173662663 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,10 +1193,18 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() +plan.transformAllExpressions { + case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f + case f: UnresolvedFunction if catalog.functionExists(f.name) => +catalogFunctionNameSet.add(f.name) --- End diff -- sure, I will do that. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20796: [SPARK-23649][SQL] Prevent crashes on schema infe...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/20796 [SPARK-23649][SQL] Prevent crashes on schema inferring of CSV containing wrong UTF-8 chars ## What changes were proposed in this pull request? The mapping of UTF-8 char's first byte to char's size doesn't cover whole range 0-255. It is defined only for 0-253: https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L60-L65 https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L190 If the first byte of a char is 253-255, IndexOutOfBoundsException is thrown. Besides of that values for 244-252 are not correct according to recent unicode standard for UTF-8: http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf As a consequence of the exception above, the length of input string in UTF-8 encoding cannot be calculated if the string contains chars started from 253 code. It is visible on user's side as for example crashing of schema inferring of csv file which contains such chars but the file can be read if the schema is specified explicitly or if the mode set to multiline. The proposed changes build correct mapping of first byte of UTF-8 char to its size (now it covers all cases) and skip disallowed chars (counts it as one octet). ## How was this patch tested? Added a test and a file with a char which is disallowed in UTF-8 - 0xFF. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 skip-wrong-utf8-chars Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20796.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20796 commit 6d6b3ca8eedc7bf6381cb8d746fe8a3ee0a281b2 Author: Maxim GekkDate: 2018-02-27T17:41:01Z Test: skip bytes disallowed in UTF-8 commit 0f474a033f152340519b10676026b80b7593c2f5 Author: Maxim Gekk Date: 2018-02-27T17:43:53Z Making correct map of first byte to char size in UTF-8 and skip bytes disallowed in UTF-8 commit 2ee661618a40544a8cbf3ac0794a1a6b86b6b62d Author: Maxim Gekk Date: 2018-02-28T13:26:21Z Check inferred schema and returned bad string commit d6c5f02ea1a08513a54ea9f3b30986dd92188b3e Author: Maxim Gekk Date: 2018-02-28T14:04:22Z The test csv was simplified --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r173660285 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,10 +1193,18 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() +plan.transformAllExpressions { + case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f + case f: UnresolvedFunction if catalog.functionExists(f.name) => +catalogFunctionNameSet.add(f.name) --- End diff -- Normalize the name before adding it to the cache? This can cover more cases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/20795 @viirya Thanks a lot. I will create a new test file LookupFunctionsSuite under sql/catalyst/analysis. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20763 **[Test build #88162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88162/testReport)** for PR 20763 at commit [`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1455/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20763 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88160/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88160/testReport)** for PR 20793 at commit [`177afcc`](https://github.com/apache/spark/commit/177afcc4277b604b783aef40d86d93d6a9add6fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88161/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88161/testReport)** for PR 20795 at commit [`701100c`](https://github.com/apache/spark/commit/701100c11126d7437dc03ef20b484e84e2f9cb2a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20795 I've added a test like this: ```scala class LookupFunctionsSuite extends PlanTest { test("SPARK-23486: LookupFunctions should not check the same function name more than once") { val externalCatalog = new CustomInMemoryCatalog val analyzer = { val conf = new SQLConf() val catalog = new SessionCatalog(externalCatalog, FunctionRegistry.builtin, conf) catalog.createDatabase( CatalogDatabase("default", "", new URI("loc"), Map.empty), ignoreIfExists = false) new Analyzer(catalog, conf) } val unresolvedFunc = UnresolvedFunction("func", Seq.empty, false) val plan = Project( Seq(Alias(unresolvedFunc, "call1")(), Alias(unresolvedFunc, "call2")()), table("TaBlE")) analyzer.LookupFunctions.apply(plan) assert(externalCatalog.getFunctionExistsCalledTimes == 1) } } class CustomInMemoryCatalog extends InMemoryCatalog { private var functionExistsCalledTimes: Int = 0 override def functionExists(db: String, funcName: String): Boolean = synchronized { functionExistsCalledTimes = functionExistsCalledTimes + 1 true } def getFunctionExistsCalledTimes: Int = functionExistsCalledTimes } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r173655309 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,10 +1193,18 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() +plan.transformAllExpressions { --- End diff -- style: indent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hint in p...
Github user DylanGuedes commented on the issue: https://github.com/apache/spark/pull/20788 Hi, I added two subtasks (one for Python and one for R) [here](https://issues.apache.org/jira/browse/SPARK-21030) Should I close this PR and open a new one or it is ok to just rename it to the new JIRA id? @felixcheung I don't know a good way to test it, do you have any suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88161/testReport)** for PR 20795 at commit [`701100c`](https://github.com/apache/spark/commit/701100c11126d7437dc03ef20b484e84e2f9cb2a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20795 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20795 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20795 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20795 ok to test (seems uppercase not working somehow with Jenkins .. ) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88160/testReport)** for PR 20793 at commit [`177afcc`](https://github.com/apache/spark/commit/177afcc4277b604b783aef40d86d93d6a9add6fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20795 Ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20793 The question is that existing output of pseudo random/sample is guaranteed by public API. It seems it doesn't. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20793 At least some tests expect that particular values would be result of sample/random: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L550-L564 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org