[GitHub] [spark] SparkQA removed a comment on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
SparkQA removed a comment on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#issuecomment-513040922 **[Test build #107868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107868/testReport)** for PR 25101 at commit [`6e01a86`](https://github.com/apache/spark/commit/6e01a86f6968ea40be0577b920921aeb12d8da89). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
AmplabJenkins commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#issuecomment-513070068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107868/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
AmplabJenkins commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#issuecomment-513070066 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
SparkQA commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#issuecomment-513069930 **[Test build #107868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107868/testReport)** for PR 25101 at commit [`6e01a86`](https://github.com/apache/spark/commit/6e01a86f6968ea40be0577b920921aeb12d8da89). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25114: [SPARK-28349][SQL] Add FALSE and SETMINUS to ansiNonReserved
maropu commented on issue #25114: [SPARK-28349][SQL] Add FALSE and SETMINUS to ansiNonReserved URL: https://github.com/apache/spark/pull/25114#issuecomment-513067079 I think we don't need to do so cuz we basically follow SQL-2011 now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #25196: [SPARK-28279][SQL][PYTHON][TESTS] Convert and port 'group-analytics.sql' into UDF test base
viirya commented on a change in pull request #25196: [SPARK-28279][SQL][PYTHON][TESTS] Convert and port 'group-analytics.sql' into UDF test base URL: https://github.com/apache/spark/pull/25196#discussion_r305182448 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-analytics.sql ## @@ -0,0 +1,62 @@ +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES Review comment: This doesn't include a comment like `-- This test file was converted from ...`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305181626 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -484,6 +484,8 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { """.stripMargin) .load(testFile("test-data/postgresql/tenk.data")) .createOrReplaceTempView("tenk1") + + session.sql("select * from tenk1 where stringu1 like stringu2 escape '\"'") Review comment: This is not correct and I meant we need test s in `resources/sql-tests/inputs/`. Or, to check if the parser works well for the new token `ESCAPE`, its ok to add tests in `ExpressionParserSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25020: [SPARK-28220][SQL] Fix foldable join condition not pushed down when parent filter is wholly pushed down
HyukjinKwon commented on a change in pull request #25020: [SPARK-28220][SQL] Fix foldable join condition not pushed down when parent filter is wholly pushed down URL: https://github.com/apache/spark/pull/25020#discussion_r305181588 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1241,7 +1241,10 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { if (others.nonEmpty) { Filter(others.reduceLeft(And), join) } else { -join +// Add Literal(true) filter conditions to avoid those join plan whose parent filter is +// totally pushed down being skipped during rule transform down. Also these useless +// filter will be pruned in rule `PruneFilters` later. +Filter(Literal(true), join) Review comment: Can we please avoid such hacky and widely affected change to fix a corner case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25020: [SPARK-28220][SQL] Fix foldable join condition not pushed down when parent filter is wholly pushed down
HyukjinKwon commented on a change in pull request #25020: [SPARK-28220][SQL] Fix foldable join condition not pushed down when parent filter is wholly pushed down URL: https://github.com/apache/spark/pull/25020#discussion_r305181588 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1241,7 +1241,10 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { if (others.nonEmpty) { Filter(others.reduceLeft(And), join) } else { -join +// Add Literal(true) filter conditions to avoid those join plan whose parent filter is +// totally pushed down being skipped during rule transform down. Also these useless +// filter will be pruned in rule `PruneFilters` later. +Filter(Literal(true), join) Review comment: Can we please avoid such hacky and wide-affected changes to fix a corner case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305181122 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/BooleanTestSuite.scala ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.BooleanTest._ +import org.apache.spark.sql.types._ + +class BooleanTestSuite extends SparkFunSuite with ExpressionEvalHelper { Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305180881 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1240,7 +1240,14 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case SqlBaseParser.IN => invertIfNotDefined(In(e, ctx.expression.asScala.map(expression))) case SqlBaseParser.LIKE => -invertIfNotDefined(Like(e, expression(ctx.pattern))) +val escapeOpt = Option(ctx.escapeChar).map(string).map { str => + if (str.length > 1) { +throw new ParseException("Invalid escape string." + + "Escape string must be empty or one character.", ctx) Review comment: Any test for this exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message
HyukjinKwon commented on a change in pull request #25019: [SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message URL: https://github.com/apache/spark/pull/25019#discussion_r305181012 ## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala ## @@ -188,6 +189,15 @@ abstract class BaseSessionStateBuilder( customCheckRules override protected def lookupCatalog(name: String): CatalogPlugin = session.catalog(name) + +override def checkAnalysis(plan: LogicalPlan): Unit = { + // We should check it's innerChildren for InsertIntoDataSourceDirCommand + val planToCheck = plan match { +case e: InsertIntoDataSourceDirCommand => e.query +case _ => plan + } + super.checkAnalysis(planToCheck) Review comment: I think you should investigate it by yourself. I don't like the current approach too. Appeartly issue is minor but the fix is pretty invasive. I doesn't need to touch SessionStateBuilder side at all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
AmplabJenkins removed a comment on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198#issuecomment-513063619 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
AmplabJenkins commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198#issuecomment-513064114 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
AmplabJenkins removed a comment on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198#issuecomment-513063493 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
AmplabJenkins commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198#issuecomment-513063619 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Deegue commented on issue #23593: [SPARK-26667][DOC] Add `Scanning Input Table` to Performance Tuning Guide
Deegue commented on issue #23593: [SPARK-26667][DOC] Add `Scanning Input Table` to Performance Tuning Guide URL: https://github.com/apache/spark/pull/23593#issuecomment-513063621 Rewrote the doc and updated the screenshot, could you please review again? @srowen @dongjoon-hyun I think maybe it's better than before.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
AmplabJenkins commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198#issuecomment-513063493 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base
huaxingao commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25103#issuecomment-513062510 diff updated. @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
ulysses-you commented on issue #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198#issuecomment-513062313 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType
ulysses-you opened a new pull request #25198: [SPARK-28443][SQL] Spark sql add exception when create field type NullType URL: https://github.com/apache/spark/pull/25198 ## What changes were proposed in this pull request? More detail see [PR](https://github.com/apache/spark/pull/25085). This PR is to discuss details that `add exception when create field type NullType` ## How was this patch tested? UT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LiShuMing commented on a change in pull request #25192: [SPARK-28436][SQL] Throw better exception when datasource's schema i…
LiShuMing commented on a change in pull request #25192: [SPARK-28436][SQL] Throw better exception when datasource's schema i… URL: https://github.com/apache/spark/pull/25192#discussion_r305179382 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -339,7 +339,8 @@ case class DataSource( val baseRelation = dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions) if (baseRelation.schema != schema) { - throw new AnalysisException(s"$className does not allow user-specified schemas.") + throw new AnalysisException(s"$className does not allow user-specified schemas, " + + s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}") Review comment: THX for your replies, I will fix it later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
SparkQA commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-513062238 **[Test build #107876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107876/testReport)** for PR 25127 at commit [`e70e147`](https://github.com/apache/spark/commit/e70e147873f7b180a5551c2b2f047019c3bb6d79). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LiShuMing commented on a change in pull request #25192: [SPARK-28436][SQL] Throw better exception when datasource's schema i…
LiShuMing commented on a change in pull request #25192: [SPARK-28436][SQL] Throw better exception when datasource's schema i… URL: https://github.com/apache/spark/pull/25192#discussion_r305179301 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -339,7 +339,8 @@ case class DataSource( val baseRelation = dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions) if (baseRelation.schema != schema) { - throw new AnalysisException(s"$className does not allow user-specified schemas.") + throw new AnalysisException(s"$className does not allow user-specified schemas, " + + s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}") Review comment: OK~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
AmplabJenkins removed a comment on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-513061671 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
AmplabJenkins removed a comment on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-513061673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12992/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
AmplabJenkins commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-513061673 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12992/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
AmplabJenkins commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-513061671 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25192: [SPARK-28436][SQL] Throw better exception when datasource's schema i…
HyukjinKwon commented on a change in pull request #25192: [SPARK-28436][SQL] Throw better exception when datasource's schema i… URL: https://github.com/apache/spark/pull/25192#discussion_r305178777 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -339,7 +339,8 @@ case class DataSource( val baseRelation = dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions) if (baseRelation.schema != schema) { - throw new AnalysisException(s"$className does not allow user-specified schemas.") + throw new AnalysisException(s"$className does not allow user-specified schemas, " + + s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}") Review comment: I would print out `schema.catalogString` for better string representation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305178415 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,27 +39,36 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +val escapeChar = escapeStr.charAt(0) val in = pattern.toIterator val out = new StringBuilder() def fail(message: String) = throw new AnalysisException( s"the pattern '$pattern' is invalid, $message") while (in.hasNext) { - in.next match { -case '\\' if in.hasNext => + val cur = in.next + if (cur == escapeChar) { Review comment: Any reason to rewrite this part from pattern-matching to if-then? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e…
AmplabJenkins removed a comment on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e… URL: https://github.com/apache/spark/pull/25197#issuecomment-513060017 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
imback82 commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r305178360 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-inline-table.sql.out ## @@ -0,0 +1,153 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 17 + + +-- !query 0 +select udf(col1), udf(col2) from values ("one", 1) +-- !query 0 schema +struct +-- !query 0 output +one1 + + +-- !query 1 +select udf(col1), udf(udf(col2)) from values ("one", 1) as data +-- !query 1 schema +struct +-- !query 1 output +one1 + + +-- !query 2 +select udf(a), b from values ("one", 1) as data(a, b) +-- !query 2 schema +struct +-- !query 2 output +one1 + + +-- !query 3 +select udf(a) from values 1, 2, 3 as data(a) +-- !query 3 schema +struct +-- !query 3 output +1 +2 +3 + + +-- !query 4 +select udf(a), b from values ("one", 1), ("two", 2), ("three", null) as data(a, b) +-- !query 4 schema +struct +-- !query 4 output +one1 +three NULL +two2 + + +-- !query 5 +select a, udf(b) from values ("one", null), ("two", null) as data(a, b) +-- !query 5 schema +struct<> +-- !query 5 output +org.apache.spark.sql.AnalysisException +cannot resolve 'CAST(udf(cast(b as string)) AS NULL)' due to data type mismatch: cannot cast string to null; line 1 pos 10 Review comment: @HyukjinKwon Do you prefer that or just work around with `udf(a), b`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
SparkQA commented on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#issuecomment-513060496 **[Test build #107875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107875/testReport)** for PR 25184 at commit [`c8f1a52`](https://github.com/apache/spark/commit/c8f1a5283669dc9b019c3e5c77b2ed5d1fb14959). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e…
AmplabJenkins commented on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e… URL: https://github.com/apache/spark/pull/25197#issuecomment-513060418 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
HyukjinKwon commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-513060480 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields
AmplabJenkins removed a comment on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields URL: https://github.com/apache/spark/pull/22029#issuecomment-513060136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12991/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
AmplabJenkins removed a comment on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#issuecomment-513060098 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields
AmplabJenkins removed a comment on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields URL: https://github.com/apache/spark/pull/22029#issuecomment-513060134 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305178020 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/BooleanTestSuite.scala ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.BooleanTest._ +import org.apache.spark.sql.types._ + +class BooleanTestSuite extends SparkFunSuite with ExpressionEvalHelper { Review comment: OK. I will do it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS] Convert and port 'pivot.sql' into UDF test base
HyukjinKwon commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-513060332 Thank YOU @chitralverma for staying focused on each diff here and writing a PR even without nits :D. Nowdays, those details are a key. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
AmplabJenkins removed a comment on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#issuecomment-513060094 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
AmplabJenkins commented on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#issuecomment-513060098 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields
AmplabJenkins commented on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields URL: https://github.com/apache/spark/pull/22029#issuecomment-513060136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12991/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e…
AmplabJenkins removed a comment on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e… URL: https://github.com/apache/spark/pull/25197#issuecomment-513059906 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
AmplabJenkins commented on issue #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#issuecomment-513060094 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields
AmplabJenkins commented on issue #22029: [SPARK-24395][SQL] IN operator should return NULL when comparing struct with NULL fields URL: https://github.com/apache/spark/pull/22029#issuecomment-513060134 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e…
AmplabJenkins commented on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e… URL: https://github.com/apache/spark/pull/25197#issuecomment-513060017 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r305177882 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-inline-table.sql.out ## @@ -0,0 +1,153 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 17 + + +-- !query 0 +select udf(col1), udf(col2) from values ("one", 1) +-- !query 0 schema +struct +-- !query 0 output +one1 + + +-- !query 1 +select udf(col1), udf(udf(col2)) from values ("one", 1) as data +-- !query 1 schema +struct +-- !query 1 output +one1 + + +-- !query 2 +select udf(a), b from values ("one", 1) as data(a, b) +-- !query 2 schema +struct +-- !query 2 output +one1 + + +-- !query 3 +select udf(a) from values 1, 2, 3 as data(a) +-- !query 3 schema +struct +-- !query 3 output +1 +2 +3 + + +-- !query 4 +select udf(a), b from values ("one", 1), ("two", 2), ("three", null) as data(a, b) +-- !query 4 schema +struct +-- !query 4 output +one1 +three NULL +two2 + + +-- !query 5 +select a, udf(b) from values ("one", null), ("two", null) as data(a, b) +-- !query 5 schema +struct<> +-- !query 5 output +org.apache.spark.sql.AnalysisException +cannot resolve 'CAST(udf(cast(b as string)) AS NULL)' due to data type mismatch: cannot cast string to null; line 1 pos 10 Review comment: @imback82, let's comment out this test with a comment saying the current UDF is not supported for the conversion from string to null for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e…
AmplabJenkins commented on issue #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e… URL: https://github.com/apache/spark/pull/25197#issuecomment-513059906 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a change in pull request #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
WeichenXu123 commented on a change in pull request #25184: [SPARK-28431][SQL] Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#discussion_r305177751 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1660,6 +1660,14 @@ object SQLConf { .booleanConf .createWithDefault(true) + val CSV_PARSER_MAX_ERROR_CONTENT_LENGTH = buildConf( Review comment: Update code. Currently I hardcoded the maxErrorContentLength to be 1024. Adding another config looks too fussy for end users. This is just a minor setting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25071: [SPARK-28292][SQL] Enable inject user-defined Hint
maropu commented on a change in pull request #25071: [SPARK-28292][SQL] Enable inject user-defined Hint URL: https://github.com/apache/spark/pull/25071#discussion_r305177692 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -151,9 +156,10 @@ class Analyzer( lazy val batches: Seq[Batch] = Seq( Batch("Hints", fixedPoint, - new ResolveHints.ResolveJoinStrategyHints(conf), - ResolveHints.ResolveCoalesceHints, - new ResolveHints.RemoveAllHints(conf)), + new ResolveHints.ResolveJoinStrategyHints(conf) +: +ResolveHints.ResolveCoalesceHints +: +extendedResolutionHints :+ Review comment: cc: @maryannxue , too. (I think this kind of extensions is basically bug-prone and we always need a better design...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305177621 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -840,3 +840,46 @@ case class GreaterThanOrEqual(left: Expression, right: Expression) protected override def nullSafeEval(input1: Any, input2: Any): Any = ordering.gteq(input1, input2) } + +/** + * Test the value of an expression is true, false, or unknown. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, booleanValue) - Returns true if `expr` equals booleanValue, " + +"or false otherwise.", + arguments = """ +Arguments: + * expr - a boolean expression + * booleanValue - a boolean value represented by a string. booleanValue must be one + of TRUE, FALSE and UNKNOWN. + """, + examples = """ +Examples: +> SELECT _FUNC_(1 > 2, true); + false +> SELECT _FUNC_(2 > 1, true); Review comment: @dongjoon-hyun @maropu Thanks for your suggestion. I will remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base
HyukjinKwon commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25103#issuecomment-513059728 @huaxingao, can you update the diff comparing to the original file? Looks a bit weird. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
HyukjinKwon commented on issue #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#issuecomment-513059524 cc @viirya for a quick double check if you're available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
AmplabJenkins removed a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513058821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107873/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305177098 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1243,6 +1249,24 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging IsNotNull(e) case SqlBaseParser.NULL => IsNull(e) + case SqlBaseParser.TRUE if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, Some(true))) + case SqlBaseParser.TRUE => +checkBooleanTestArgs(e) +BooleanTest(e, Some(true)) + case SqlBaseParser.FALSE if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, Some(false))) + case SqlBaseParser.FALSE => +checkBooleanTestArgs(e) +BooleanTest(e, Some(false)) + case SqlBaseParser.UNKNOWN if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, None)) Review comment: @dongjoon-hyun OK. I changing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin opened a new pull request #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e…
xianyinxin opened a new pull request #25197: [SPARK-21067][SQL] Initialize the HdfsEncryptionShim and the underlying FileSystem(FS) e… URL: https://github.com/apache/spark/pull/25197 ## What changes were proposed in this pull request? This pr fix the "Filesystem Closed" exception that caused by the closing of a hive session in thriftserver. A `sessionState` holds a `hdfsEncryptionShim`, which holds a `fileSystem`. When a session was closed, the fileSystem may be closed. While the fileSystem is cached, so the subsequent using of the same fileSystem may throw "Filesystem closed" exception if previous session closes the fileSystem. We can move the "holding of the `fileSystem`" from session to thriftserver by triggering `SessionState#getHdfsEncryptionShim` eagerly at the starting phase of the thriftserver, see below, ``` HiveThriftServer2#main() hiveClientImpl val state = newState SessionState.start(state) SessionState.setCurrentSession(state) **state.getHdfsEncryptionShim** ``` In hive session, the new sessionState will reuse the `fileSystem` created by thriftserver, ``` CliService openSession HiveSessionImpl init sessionState = new SessionState SessionState.setCurrentSession(sessionState) executeStatementInternal SessionState.setCurrentSession(sessionState) operation.run() hiveSession.getSessionManager().submitBackgroundOperation(operation) **// If some code calls "state.getHdfsEncryptionShim", the cached `fileSystem` is reused.** ``` ## How was this patch tested? manual test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305177098 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1243,6 +1249,24 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging IsNotNull(e) case SqlBaseParser.NULL => IsNull(e) + case SqlBaseParser.TRUE if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, Some(true))) + case SqlBaseParser.TRUE => +checkBooleanTestArgs(e) +BooleanTest(e, Some(true)) + case SqlBaseParser.FALSE if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, Some(false))) + case SqlBaseParser.FALSE => +checkBooleanTestArgs(e) +BooleanTest(e, Some(false)) + case SqlBaseParser.UNKNOWN if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, None)) Review comment: @dongjoon-hyun OK. I change it, now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305177098 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1243,6 +1249,24 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging IsNotNull(e) case SqlBaseParser.NULL => IsNull(e) + case SqlBaseParser.TRUE if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, Some(true))) + case SqlBaseParser.TRUE => +checkBooleanTestArgs(e) +BooleanTest(e, Some(true)) + case SqlBaseParser.FALSE if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, Some(false))) + case SqlBaseParser.FALSE => +checkBooleanTestArgs(e) +BooleanTest(e, Some(false)) + case SqlBaseParser.UNKNOWN if ctx.NOT != null => +checkBooleanTestArgs(e) +Not(BooleanTest(e, None)) Review comment: @dongjoon-hyun OK. I changed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
SparkQA commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058974 **[Test build #107874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107874/testReport)** for PR 25195 at commit [`195bcbd`](https://github.com/apache/spark/commit/195bcbd16a52f703846acea710422faad67fc6dd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
SparkQA removed a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513057403 **[Test build #107873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107873/testReport)** for PR 22282 at commit [`aa3f0a1`](https://github.com/apache/spark/commit/aa3f0a138ffed7e977e972eaa33217cbd3c7b664). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
AmplabJenkins commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513058821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107873/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
AmplabJenkins removed a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513058819 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
SparkQA commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513058805 **[Test build #107873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107873/testReport)** for PR 22282 at commit [`aa3f0a1`](https://github.com/apache/spark/commit/aa3f0a138ffed7e977e972eaa33217cbd3c7b664). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
AmplabJenkins commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513058819 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
AmplabJenkins removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12989/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
AmplabJenkins removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
AmplabJenkins commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng commented on a change in pull request #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#discussion_r305176615 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ## @@ -603,6 +603,19 @@ class SparseVector @Since("2.0.0") ( private[spark] override def asBreeze: BV[Double] = new BSV[Double](indices, values, size) + override def apply(i: Int): Double = { +if (i < 0 || i >= size) { + throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") +} + +if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) { Review comment: @srowen I add the checks just because in the impl of `findOffset` in `breeze.collection.mutable.SparseArray`, it says `// special case for end of list - this is a big win for growing sparse arrays`, and I think it is reasonable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
AmplabJenkins commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12989/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#discussion_r305176584 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-group-by.sql.out ## @@ -0,0 +1,521 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 52 + + +-- !query 0 +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +SELECT udf(a), udf(COUNT(b)) FROM testData +-- !query 1 schema +struct<> +-- !query 1 output +org.apache.spark.sql.AnalysisException +grouping expressions sequence is empty, and 'testdata.`a`' is not an aggregate function. Wrap '(CAST(udf(cast(count(b) as string)) AS BIGINT) AS `CAST(udf(cast(count(b) as string)) AS BIGINT)`)' in windowing function(s) or wrap 'testdata.`a`' in first() (or first_value) if you don't care which value you get.; + + +-- !query 2 +SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData +-- !query 2 schema +struct +-- !query 2 output +7 7 + + +-- !query 3 +SELECT udf(a), COUNT(udf(b)) FROM testData GROUP BY a +-- !query 3 schema +struct +-- !query 3 output +1 2 +2 2 +3 2 +NULL 1 + + +-- !query 4 +SELECT udf(a), udf(COUNT(udf(b))) FROM testData GROUP BY b +-- !query 4 schema +struct<> +-- !query 4 output +org.apache.spark.sql.AnalysisException +expression 'testdata.`a`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; + + +-- !query 5 +SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a) +-- !query 5 schema +struct +-- !query 5 output +0 1 +2 2 +2 2 +3 2 + + +-- !query 6 +SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1 +-- !query 6 schema +struct +-- !query 6 output +foo7 + + +-- !query 7 +SELECT 'foo' FROM testData WHERE a = 0 GROUP BY udf(1) +-- !query 7 schema +struct +-- !query 7 output + + + +-- !query 8 +SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1 +-- !query 8 schema +struct +-- !query 8 output + + + +-- !query 9 +SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1 +-- !query 9 schema +struct> +-- !query 9 output + + + +-- !query 10 +SELECT udf(a + b), udf(COUNT(b)) FROM testData GROUP BY a + b +-- !query 10 schema +struct +-- !query 10 output +2 1 +3 2 +4 2 +5 1 +NULL 1 + + +-- !query 11 +SELECT udf(a + 2), udf(COUNT(b)) FROM testData GROUP BY a + 1 +-- !query 11 schema +struct<> +-- !query 11 output +org.apache.spark.sql.AnalysisException +expression 'testdata.`a`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; + + +-- !query 12 +SELECT udf(a + 1 + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 1) Review comment: Let's comment the test with some comments that explains this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25194: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base
HyukjinKwon closed pull request #25194: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base URL: https://github.com/apache/spark/pull/25194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305176319 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/BooleanExpressionsSuite.scala ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.BooleanTest._ +import org.apache.spark.sql.types._ + +class BooleanExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { + + val row0 = create_row(null) + val row1 = create_row(false) + val row2 = create_row(true) + + test("istrue and isnottrue") { +checkEvaluation(BooleanTest(Literal.create(null, NullType), TRUE), false, row0) +checkEvaluation(Not(BooleanTest(Literal.create(null, NullType), TRUE)), true, row0) +checkEvaluation(BooleanTest(Literal.create(false, BooleanType), TRUE), false, row1) +checkEvaluation(Not(BooleanTest(Literal.create(false, BooleanType), TRUE)), true, row1) +checkEvaluation(BooleanTest(Literal.create(true, BooleanType), TRUE), true, row2) +checkEvaluation(Not(BooleanTest(Literal.create(true, BooleanType), TRUE)), false, row2) + } + + test("isfalse and isnotfalse") { +checkEvaluation(BooleanTest(Literal.create(null, NullType), FALSE), false, row0) +checkEvaluation(Not(BooleanTest(Literal.create(null, NullType), FALSE)), true, row0) +checkEvaluation(BooleanTest(Literal.create(false, BooleanType), FALSE), true, row1) +checkEvaluation(Not(BooleanTest(Literal.create(false, BooleanType), FALSE)), false, row1) +checkEvaluation(BooleanTest(Literal.create(true, BooleanType), FALSE), false, row2) +checkEvaluation(Not(BooleanTest(Literal.create(true, BooleanType), FALSE)), true, row2) + } + + test("isunknown and isnotunknown") { +checkEvaluation(BooleanTest(Literal.create(null, NullType), UNKNOWN), true, row0) +checkEvaluation(Not(BooleanTest(Literal.create(null, NullType), UNKNOWN)), false, row0) + } + +} + Review comment: OK. Let me have a try! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
AmplabJenkins removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-512841052 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
HyukjinKwon commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058045 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
HyukjinKwon commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058024 Add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
HyukjinKwon removed a comment on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513058024 Add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng commented on a change in pull request #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#discussion_r305176094 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ## @@ -603,6 +603,19 @@ class SparseVector @Since("2.0.0") ( private[spark] override def asBreeze: BV[Double] = new BSV[Double](indices, values, size) + override def apply(i: Int): Double = { +if (i < 0 || i >= size) { + throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") +} + +if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) { Review comment: you can see that if the `nnz` grows, the speed up decrese. That is because with a big `nnz`, the searching complexity `log(nnz)` dominate the whole process. However, when `nnz` is a small number (most frequently), the conversion is relatively the main part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into …
HyukjinKwon commented on issue #25195: [SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into … URL: https://github.com/apache/spark/pull/25195#issuecomment-513057962 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25196: [SPARK-28279][SQL][PYTHON][TESTS] Convert and port 'group-analytics.sql' into UDF test base
HyukjinKwon commented on issue #25196: [SPARK-28279][SQL][PYTHON][TESTS] Convert and port 'group-analytics.sql' into UDF test base URL: https://github.com/apache/spark/pull/25196#issuecomment-513057895 cc @viirya for a quick double check if you're available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25194: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base
HyukjinKwon commented on issue #25194: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base URL: https://github.com/apache/spark/pull/25194#issuecomment-513057563 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
SparkQA commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-513057403 **[Test build #107873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107873/testReport)** for PR 22282 at commit [`aa3f0a1`](https://github.com/apache/spark/commit/aa3f0a138ffed7e977e972eaa33217cbd3c7b664). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-513057031 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
SparkQA commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-513057417 **[Test build #107872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107872/testReport)** for PR 25085 at commit [`c9fc32c`](https://github.com/apache/spark/commit/c9fc32c126cd010d8b16dc795f95ba6a84ac1cd4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-513057039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12988/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor of `StopWordsRemover`. This PR actually targets to allow to set different locale (vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or OS only to use this API. Here's an example full stack trace: ``` Py4JJavaError: An error occurred while calling None.org.apache.spark.ml.feature.StopWordsRemover. : java.lang.IllegalArgumentException: StopWordsRemover_daf8924a73f7 parameter locale given invalid value pl_US. at org.apache.spark.ml.param.Param.validate(params.scala:77) at org.apache.spark.ml.param.ParamPair.(params.scala:656) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:87) at org.apache.spark.ml.feature.StopWordsRemover.(StopWordsRemover.scala:109) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor. This PR actually targets to allow to set different locale. Otherwise, the locale should be set into JVM or OS only to use this API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor. This PR actually targets to allow to set different locale (vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or OS only to use this API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-513057031 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-513057039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12988/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor of `StopWordsRemover`. This PR actually targets to allow to set different locale (vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or OS only to use this API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #25160: [SPARK-28399][ML] implement RobustScaler
zhengruifeng commented on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-513056142 @srowen I am adding it to the pyspark side in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng commented on a change in pull request #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#discussion_r305174098 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ## @@ -603,6 +603,19 @@ class SparseVector @Since("2.0.0") ( private[spark] override def asBreeze: BV[Double] = new BSV[Double](indices, values, size) + override def apply(i: Int): Double = { +if (i < 0 || i >= size) { + throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") +} + +if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) { Review comment: @srowen @kiszk on each call of `Sparse.apply`, a conversion to `breeze.linalg.SparseVector` & `breeze.collection.mutable.SparseArray` was performed internally. The improvement coms from avoiding this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305172239 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,27 +39,36 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +val escapeChar = escapeStr.charAt(0) Review comment: this might need to `assert(escapeStr.length == 1)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305171884 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,38 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. """, examples = """ Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/' + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches() - override def toString: String = s"$left LIKE $right" + override def toString: String = s"$left LIKE $right" + +escapeCharOpt.map(str => s" ESCAPE $str").getOrElse("") Review comment: `str` has the same handling with line 150-156 for special chars? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-513052583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107864/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-513052579 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-513052583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107864/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-513052579 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305171610 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,38 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. """, examples = """ Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/' + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) Review comment: nit: How about this? Then, remove the `escapeStr` variable. ``` override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeCharOpt.getOrElse("\\")) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org